#llm-hallucinations News & Analysis

14 articles tagged with #llm-hallucinations. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

14 articles

AIBullisharXiv – CS AI · 2d ago7/10

🧠

TTFT-Aware Graph Chain-of-Thought:Distance-Indexed Neural A* for Low-Hallucination Multi-Hop Medical Reasoning

Researchers present GraphRAG, a production-grade system for medical LLMs that reduces hallucinations by constraining answers to verifiable paths within a 700K-node medical knowledge graph. Using Pruned Landmark Labeling and AStarNet heuristics, the system improves clinical reasoning accuracy while reducing latency and hallucination rates in fertility assistant applications.

AIBullisharXiv – CS AI · 6d ago7/10

🧠

Detecting Hallucinations for Large Language Model-based Knowledge Graph Reasoning

Researchers introduce LUCID, a novel hallucination detection method for large language models used in knowledge graph reasoning tasks. By combining LLM attention scores, knowledge graph semantics, and structural information through graph neural networks, LUCID achieves state-of-the-art performance across nine datasets, addressing a critical reliability gap in AI-driven knowledge systems.

AINeutralarXiv – CS AI · Jun 117/10

🧠

From Architecture to Output: Structural Origins of Hallucination in Large Language Models and the Amplifying Role of Data

Researchers identify three core architectural mechanisms in large language models that systematically produce hallucinations: self-attention's statistical confusion of entities, maximum likelihood training that rewards plausible-sounding falsehoods, and autoregressive decoding that cascades errors forward. Dataset quality issues amplify rather than originate these failures, suggesting that fixing hallucinations requires architectural redesign, not just better training data.

AIBullisharXiv – CS AI · Jun 27/10

🧠

Mitigating Hallucinations in Large Language Models Via Decoder Layer Skipping

Researchers introduce DeLask, a novel decoding framework that reduces hallucinations in Large Language Models by dynamically skipping decoder layers prone to generating false information. The method uses gradient-based analysis to identify problematic layers and partially aggregates their hidden states, demonstrating consistent improvements across diverse LLMs without requiring model retraining.

AINeutralarXiv – CS AI · May 277/10

🧠

Why LLMs Hallucinate on Structured Knowledge: A Mechanistic Analysis of Reasoning over Linearized Representations

Researchers have identified the mechanistic causes of hallucinations in large language models when reasoning over structured knowledge like graphs and tables. The study reveals that hallucinations stem from systematic failures in attention allocation and semantic grounding in feed-forward layers, rather than random errors, with findings applicable across multiple structured knowledge formats.

AIBearisharXiv – CS AI · May 277/10

🧠

Evaluating the Relevance of Uncertainty Estimators for LLM Hallucination

Researchers challenge the assumption that uncertainty estimation methods can reliably detect LLM hallucinations, finding highly variable and often weak associations across different hallucination types. The study evaluates multiple uncertainty quantification approaches against intrinsic and extrinsic hallucinations, revealing that uncertainty signals may not consistently indicate model failures.

AIBearisharXiv – CS AI · May 117/10

🧠

LLM hallucinations in the wild: Large-scale evidence from non-existent citations

Researchers auditing 2.5 million scientific papers found 146,932 hallucinated citations in 2025 alone, with non-existent references surging sharply after LLM adoption. The errors concentrate in AI-heavy fields and papers with linguistic signatures of AI assistance, while current journal moderation fails to catch most instances, threatening scientific integrity and reinforcing existing biases in academic credit attribution.

AINeutralarXiv – CS AI · May 117/10

🧠

A Geometric Taxonomy of Hallucinations in LLMs

Researchers propose a geometric framework for detecting hallucinations in large language models by analyzing embedding space structure, categorizing three types of errors with different detectability profiles. The approach outperforms standard NLI baselines on expert-annotated datasets, providing interpretable diagnostics for production systems operating under black-box constraints.

AIBearisharXiv – CS AI · Jun 96/10

🧠

Evaluating Hallucinations in Domain-Adapted Large Language Models

Researchers investigating hallucinations in fine-tuned Large Language Models found that domain adaptation via fine-tuning alone is insufficient to prevent inaccurate outputs. Testing Llama-2 with domain-specific data revealed the model struggles with novel reasoning tasks and tends to over-generate information, highlighting fundamental limitations in current LLM adaptation techniques.

🧠 Llama

AINeutralarXiv – CS AI · Jun 46/10

🧠

Geometry-Aware Hallucination Detection in Large Language Models

Researchers introduce GA-ICL, a geometry-aware framework that improves hallucination detection in large language models by selecting better in-context learning demonstrations. Rather than relying on surface-level text similarity, the method uses latent representations and prototype geometry to choose demonstrations, achieving stronger performance across factual verification and hallucination detection benchmarks while maintaining robustness across model scales.

AIBearisharXiv – CS AI · May 286/10

🧠

Hallucination Behavior in Multimodal LLMs Across Agricultural Image Interpretation and Generation Tasks

A comprehensive study reveals that multimodal large language models exhibit significant hallucination problems in agricultural imaging tasks, with image interpretation achieving only 63-75% zero-shot accuracy and text-to-image generation producing up to 91% biologically inconsistent scenes. These findings highlight critical reliability gaps that could undermine the trustworthiness of AI-driven agricultural platforms.

🧠 GPT-5🧠 Gemini

AINeutralarXiv – CS AI · May 286/10

🧠

CiteCheck: Retrieval-Grounded Detection of LLM Citation Hallucinations in Scientific Text

Researchers introduce CiteCheck, a hybrid framework that detects when large language models fabricate or corrupt scientific citations by combining scholarly database retrieval with structured LLM verification. The system achieves 88.7% macro-F1 on a new 982-citation physics benchmark, outperforming GPT, Claude, and Gemini, addressing a critical reliability problem as LLMs become integrated into scientific research workflows.

🧠 Claude🧠 Gemini

AINeutralarXiv – CS AI · May 126/10

🧠

APCD: Adaptive Path-Contrastive Decoding for Reliable Large Language Model Generation

Researchers propose Adaptive Path-Contrastive Decoding (APCD), a multi-path decoding framework designed to reduce hallucinations in large language models by intelligently branching token generation paths based on entropy levels and controlling interactions between diverging prediction trajectories. The method demonstrates improved factual accuracy across eight benchmarks while maintaining computational efficiency.

AIBullisharXiv – CS AI · Mar 276/10

🧠

Mapping the Course for Prompt-based Structured Prediction

Researchers propose combining large language models (LLMs) with combinatorial inference to address hallucinations and improve structured prediction accuracy. The study finds that incorporating symbolic inference yields more consistent predictions than prompting alone, with calibration and fine-tuning further enhancing performance on complex tasks.