#research-integrity News & Analysis

11 articles tagged with #research-integrity. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

11 articles

AIBearisharXiv – CS AI · Jun 117/10

🧠

Irresponsible AI: big tech's influence on AI research and associated impacts

A research paper argues that major technology companies' dominant influence in AI development is driving irresponsible practices that prioritize scaling and profit over ethical, sustainable, and environmentally conscious AI systems. The authors trace negative societal and environmental impacts of AI to big tech's business incentives and call for collective action from researchers to counter this trend.

AIBearisharXiv – CS AI · Jun 47/10

🧠

Thinking Through Signs: PEEL as a Semiotic Scaffolding for Epistemically Accountable AI-Enabled Research

Researchers have developed PEEL (Protocols for Epistemically Engaged Literacy in AI), a framework combining deterministic distant reading tools with LLM interpretation to measure and expose systematic distortions in AI-generated text summaries. The framework reveals that large language models introduce undetectable errors in quantity, term frequency, and epistemic voice, challenging the assumption that AI fluency equals fidelity and raising critical questions about researcher accountability in AI-assisted scholarship.

🧠 Claude

AIBearisharXiv – CS AI · Jun 47/10

🧠

Position: State-of-the-Art Claims Require State-of-the-Art Evidence

Researchers identify a widespread gap between State-of-the-Art claims in AI/ML research and the evidence supporting them. Analysis of ten major benchmarks reveals that marginal improvements in aggregate scores often mask fragility, with gains driven by outlier datasets rather than meaningful superiority across tasks.

AIBearisharXiv – CS AI · May 297/10

🧠

SciIntBench: Measuring LLM Compliance with Research Integrity Norms Under Adversarial Framing

Researchers introduced SciIntBench, a benchmark testing whether large language models uphold research integrity norms across 810 adversarial prompts. The study of 16 LLMs found that models reliably refuse explicit misconduct but fail significantly when unethical requests are framed covertly or as pressure-driven shortcuts, raising concerns about LLM deployment in scientific research.

AIBullisharXiv – CS AI · May 277/10

🧠

ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence

ScientistOne introduces Chain-of-Evidence, a verifiability framework addressing critical failures in autonomous research systems where AI agents produce plausible-looking but unreliable outputs including fabricated citations, unverified scores, and misaligned methods. The system achieves zero hallucinated references and perfect score verification across five research tasks, significantly outperforming existing baseline systems that exhibit systematic failure rates up to 80%.

AINeutralarXiv – CS AI · May 277/10

🧠

Workflow Closure Is Not Scientific Closure in Auto-Research Systems

A research paper argues that autonomous AI research systems achieving workflow closure—completing full research cycles internally—do not achieve scientific closure without external validation and oversight. The authors identify three systemic failure patterns in 21 surveyed systems: objective collapse, validation collapse, and acceptance collapse, proposing design remedies to ensure AI-generated research maintains scientific integrity.

AIBearisharXiv – CS AI · Apr 207/10

🧠

ASMR-Bench: Auditing for Sabotage in ML Research

Researchers introduced ASMR-Bench, a benchmark for detecting sabotage in ML research codebases, revealing that current frontier LLMs and human auditors struggle to identify subtle implementation flaws that produce misleading results. The study found even the best-performing model (Gemini 3.1 Pro) achieved only 77% AUROC and 42% fix rate, highlighting critical vulnerabilities in AI-assisted research validation.

🧠 Gemini

AINeutralarXiv – CS AI · Jun 96/10

🧠

Report on CHIIR 2026 Workshop on Generative AI and Academic Search (GAI&AS)

The CHIIR 2026 Workshop on Generative AI and Academic Search convened researchers to examine how GenAI is transforming academic research systems beyond traditional document retrieval. Discussions centered on three themes—foundations, applications, and search-as-learning—emphasizing human-centered design principles that prioritize research integrity, transparency, and higher-order cognitive support.

AINeutralarXiv – CS AI · May 276/10

🧠

TSFMAudit: Data Contamination Auditing in Forecasting Time Series Foundation Models

Researchers introduce TSFMAudit, the first systematic method for detecting data contamination in time series foundation models (TSFMs) pretrained on large datasets. The approach identifies contamination by analyzing how quickly models adapt to evaluation data, with contaminated datasets showing unusually efficient loss reduction and minimal backbone movement during fine-tuning.

AIBullisharXiv – CS AI · Mar 96/10

🧠

Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation

A comprehensive survey examines how large multimodal language models are transforming scientific research across five key areas: literature search, idea generation, content creation, multimodal artifact production, and peer review evaluation. The research highlights both the potential for AI-assisted scientific discovery and the ethical concerns regarding research integrity and misuse of generative models.

AIBearisharXiv – CS AI · Mar 37/105

🧠

Real Money, Fake Models: Deceptive Model Claims in Shadow APIs

A systematic audit of 17 shadow APIs used in 187 academic papers reveals widespread deception, with performance divergence up to 47.21% and identity verification failures in 45.83% of tests. These third-party services claim to provide access to frontier LLMs like GPT-5 and Gemini-2.5 but deliver inconsistent outputs, undermining research validity and reproducibility.