12 articles tagged with #hallucinations. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AINeutralarXiv β CS AI Β· 6d ago7/10
π§ Researchers have conducted a comprehensive survey on hallucinations in Video Large Language Models (Vid-LLMs), identifying two core typesβdynamic distortion and content fabricationβand their root causes in temporal representation limitations and insufficient visual grounding. The study reviews evaluation benchmarks, mitigation strategies, and proposes future directions including motion-aware encoders and counterfactual learning to improve reliability.
AINeutralarXiv β CS AI Β· Apr 147/10
π§ Researchers identify a critical failure mode in multimodal AI reasoning models called Reasoning Vision Truth Disconnect (RVTD), where hallucinations occur at high-entropy decision points when models abandon visual grounding. They propose V-STAR, a training framework using hierarchical visual attention rewards and forced reflection mechanisms to anchor reasoning back to visual evidence and reduce hallucinations in long-chain tasks.
AIBearisharXiv β CS AI Β· Apr 137/10
π§ Researchers propose the Spectral Sensitivity Theorem to explain hallucinations in large ASR models like Whisper, identifying a phase transition between dispersive and attractor regimes. Analysis of model eigenspectra reveals that intermediate models experience structural breakdown while large models compress information, decoupling from acoustic evidence and increasing hallucination risk.
AINeutralarXiv β CS AI Β· Apr 77/10
π§ Researchers at arXiv have identified two key mechanisms behind reasoning hallucinations in large language models: Path Reuse and Path Compression. The study models next-token prediction as graph search, showing how memorized knowledge can override contextual constraints and how frequently used reasoning paths become shortcuts that lead to unsupported conclusions.
AINeutralarXiv β CS AI Β· Mar 177/10
π§ Researchers introduce Distributional Semantics Tracing (DST), a new framework for explaining hallucinations in large language models by tracking how semantic representations drift across neural network layers. The method reveals that hallucinations occur when models are pulled toward contextually inconsistent concepts based on training correlations rather than actual prompt context.
AIBearisharXiv β CS AI Β· Mar 127/10
π§ Research study finds that LLaMA-70B-Instruct hallucinated in 19.7% of medical Q&A responses despite high plausibility scores, highlighting significant reliability issues in AI healthcare applications. The study shows that lower hallucination rates correlate with higher usefulness scores, emphasizing the need for better safeguards in medical AI systems.
AIBullishCrypto Briefing Β· Mar 37/102
π§ OpenAI has released GPT-5.3 Instant for ChatGPT, featuring reduced refusals, enhanced web-based answers, and fewer hallucinations across major performance benchmarks. This update represents a significant improvement in AI model reliability and user experience.
AINeutralApple Machine Learning Β· Apr 136/10
π§ Researchers present a data pruning technique that improves how large language models memorize factual knowledge by optimizing training data distribution. The work, grounded in information-theoretic analysis, addresses the gap between theoretical model capacity and actual factual accuracy, offering practical methods to reduce hallucinations in knowledge-intensive tasks.
AINeutralarXiv β CS AI Β· Apr 106/10
π§ Researchers have developed a method to control how verifiable AI hallucinations are in multimodal language models by distinguishing between obvious hallucinations (easily detected by humans) and elusive ones (harder to spot). Using a dataset of 4,470 human responses, they created targeted interventions that can fine-tune which types of hallucinations occur, enabling flexible control suited to different security and usability requirements.
AIBullisharXiv β CS AI Β· Mar 266/10
π§ Researchers propose a new four-phase architecture to reduce AI hallucinations using domain-specific retrieval and verification systems. The framework achieved win rates up to 83.7% across multiple benchmarks, demonstrating significant improvements in factual accuracy for large language models.
AIBullishGoogle DeepMind Blog Β· Dec 176/103
π§ Researchers have introduced FACTS Grounding, a new benchmark designed to evaluate how accurately large language models ground their responses in source material and avoid hallucinations. The benchmark includes a comprehensive evaluation system and online leaderboard to measure LLM factuality performance.
AIBullishHugging Face Blog Β· Jan 296/105
π§ The article announces the launch of The Hallucinations Leaderboard, an open initiative designed to measure and track hallucinations in large language models. This effort aims to provide transparency and benchmarking for AI model reliability across different systems.