LLM hallucinations in the wild: Large-scale evidence from non-existent citations
Researchers auditing 2.5 million scientific papers found 146,932 hallucinated citations in 2025 alone, with non-existent references surging sharply after LLM adoption. The errors concentrate in AI-heavy fields and papers with linguistic signatures of AI assistance, while current journal moderation fails to catch most instances, threatening scientific integrity and reinforcing existing biases in academic credit attribution.
This research exposes a critical vulnerability in how AI systems are reshaping knowledge production. By analyzing verifiable objects—scientific citations—researchers quantified what was previously anecdotal: LLMs are systematically generating false references at scale. The 146,932 hallucinated citations in 2025 represent a measurable erosion of scientific trustworthiness, occurring faster than existing editorial safeguards can detect or prevent.
The phenomenon reflects a broader adoption curve where researchers increasingly use LLMs to draft, edit, and synthesize manuscripts without adequately verifying generated content. The concentration of hallucinations in AI-intensive fields and among early-career teams suggests both high AI adoption rates and lower verification capacity. Importantly, false citations disproportionately benefit already-prominent and male researchers, meaning LLM errors mechanically amplify existing inequities rather than correcting them.
For the AI industry, this data demonstrates a fundamental reliability problem that undermines enterprise deployment claims. Enterprises deploying LLMs in knowledge-work domains—legal research, medical literature reviews, financial analysis—face similar citation and fact-checking risks. This undermines arguments that scaling LLMs solves accuracy problems through training improvements alone.
Looking forward, this creates pressure for structural solutions: mandatory AI disclosure in manuscripts, enhanced citation verification systems, and probabilistic confidence scoring in LLM outputs. Publishers and platforms will face liability questions if they knowingly host hallucinated citations. The research suggests that AI adoption has outpaced verification infrastructure by months or years, creating a widening credibility gap that could fuel skepticism about AI-assisted knowledge work across sectors.
- →LLMs are generating over 146,000 false citations annually, concentrated in fields with rapid AI adoption and among early-career researchers
- →Current journal moderation and preprint screening catch only a fraction of hallucinated references, indicating verification infrastructure lags adoption
- →False citations disproportionately boost credit for already-prominent scholars, mechanically reinforcing existing gender and prestige biases in academia
- →The surge in non-existent references threatens scientific reliability at scale and creates liability risks for publishers and platforms
- →Enterprises deploying LLMs in knowledge-work domains face similar accuracy verification challenges with material business and reputational risks