#hallucination-detection News & Analysis

61 articles tagged with #hallucination-detection. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

61 articles

AIBullisharXiv – CS AI · May 77/10

🧠

Gradients with Respect to Semantics Preserving Embeddings Tell the Uncertainty of Large Language Models

Researchers introduce SemGrad, a gradient-based uncertainty quantification method for large language models that operates in semantic space rather than parameter space, eliminating the computational overhead of sampling-based approaches. The method measures output stability under semantically equivalent input perturbations to gauge LLM confidence, addressing the critical challenge of hallucinations in free-form text generation.

AIBullisharXiv – CS AI · Apr 207/10

🧠

Learning Uncertainty from Sequential Internal Dispersion in Large Language Models

Researchers introduce Sequential Internal Variance Representation (SIVR), a novel supervised framework for detecting hallucinations in large language models by analyzing token-wise and layer-wise variance patterns in hidden states. The method demonstrates superior generalization compared to existing approaches while requiring smaller training datasets, potentially enabling practical deployment of hallucination detection systems.

AINeutralarXiv – CS AI · Apr 157/10

🧠

Benchmarking Deflection and Hallucination in Large Vision-Language Models

Researchers introduce VLM-DeflectionBench, a new benchmark with 2,775 samples designed to evaluate how large vision-language models handle conflicting or insufficient evidence. The study reveals that most state-of-the-art LVLMs fail to appropriately deflect when faced with noisy or misleading information, highlighting critical gaps in model reliability for knowledge-intensive tasks.

AIBullisharXiv – CS AI · Apr 107/10

🧠

Weakly Supervised Distillation of Hallucination Signals into Transformer Representations

Researchers developed a weak supervision framework to detect hallucinations in large language models by distilling grounding signals into transformer representations during training. Using substring matching, sentence embeddings, and LLM judges, they created a 15,000-sample dataset and trained five probing classifiers that achieve hallucination detection from internal activations alone at inference time, eliminating the need for external verification systems.

AINeutralarXiv – CS AI · Apr 107/10

🧠

Blending Human and LLM Expertise to Detect Hallucinations and Omissions in Mental Health Chatbot Responses

Researchers demonstrate that standard LLM-as-a-judge methods achieve only 52% accuracy in detecting hallucinations and omissions in mental health chatbots, failing in high-risk healthcare contexts. A hybrid framework combining human domain expertise with machine learning features achieves significantly higher performance (0.717-0.849 F1 scores), suggesting that transparent, interpretable approaches outperform black-box LLM evaluation in safety-critical applications.

AIBearisharXiv – CS AI · Apr 77/10

🧠

Structural Rigidity and the 57-Token Predictive Window: A Physical Framework for Inference-Layer Governability in Large Language Models

Researchers present a new framework for AI safety that identifies a 57-token predictive window for detecting potential failures in large language models. The study found that only one out of seven tested models showed predictive signals before committing to problematic outputs, while factual hallucinations produced no detectable warning signs.

AIBullisharXiv – CS AI · Apr 77/10

🧠

Evolutionary Search for Automated Design of Uncertainty Quantification Methods

Researchers developed an LLM-powered evolutionary search method to automatically design uncertainty quantification systems for large language models, achieving up to 6.7% improvement in performance over manual designs. The study found that different AI models employ distinct evolutionary strategies, with some favoring complex linear estimators while others prefer simpler positional weighting approaches.

🧠 Claude🧠 Sonnet🧠 Opus

AIBullisharXiv – CS AI · Mar 267/10

🧠

SCoOP: Semantic Consistent Opinion Pooling for Uncertainty Quantification in Multiple Vision-Language Model Systems

Researchers developed SCoOP, a training-free framework that combines multiple Vision-Language Models to improve uncertainty quantification and reduce hallucinations in AI systems. The method achieves 10-13% better hallucination detection performance compared to existing approaches while adding only microsecond-level overhead to processing time.

AI × CryptoNeutralarXiv – CS AI · Mar 127/10

🤖

Tool Receipts, Not Zero-Knowledge Proofs: Practical Hallucination Detection for AI Agents

Researchers propose NabaOS, a lightweight verification framework that detects AI agent hallucinations using HMAC-signed tool receipts instead of zero-knowledge proofs. The system achieves 94.2% detection accuracy with <15ms verification time, compared to cryptographic approaches that require 180+ seconds per query.

AINeutralarXiv – CS AI · Mar 127/10

🧠

Beyond Scalars: Evaluating and Understanding LLM Reasoning via Geometric Progress and Stability

Researchers introduce TRACED, a framework that evaluates AI reasoning quality through geometric analysis rather than traditional scalar probabilities. The system identifies correct reasoning as high-progress stable trajectories, while AI hallucinations show low-progress unstable patterns with high curvature fluctuations.

AIBullisharXiv – CS AI · Mar 57/10

🧠

Can a Small Model Learn to Look Before It Leaps? Dynamic Learning and Proactive Correction for Hallucination Detection

Researchers propose LEAP, a new framework for detecting AI hallucinations using efficient small models that can dynamically adapt verification strategies. The system uses a teacher-student approach where a powerful model trains smaller ones to detect false outputs, addressing a critical barrier to safe AI deployment in production environments.

AIBullisharXiv – CS AI · Mar 37/104

🧠

HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs

Researchers introduce HalluGuard, a new framework that identifies and addresses both data-driven and reasoning-driven hallucinations in Large Language Models. The system achieved state-of-the-art performance across 10 benchmarks and 9 LLM backbones, offering a unified approach to improve AI reliability in critical domains like healthcare and law.

AIBullisharXiv – CS AI · Feb 277/107

🧠

Structure and Redundancy in Large Language Models: A Spectral Study via Random Matrix Theory

Researchers have developed a unified framework using Spectral Geometry and Random Matrix Theory to address reliability and efficiency challenges in large language models. The study introduces EigenTrack for real-time hallucination detection and RMT-KD for model compression while maintaining accuracy.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Fine-Grained Uncertainty Quantification for Long-Form Language Model Outputs: A Comparative Study

Researchers introduce a comprehensive framework for detecting hallucinations in long-form language model outputs through fine-grained uncertainty quantification, finding that simpler claim-level consistency methods outperform complex alternatives. The study provides practical guidance for improving factuality in extended LLM generations across STEM and geography domains.

AINeutralarXiv – CS AI · Jun 236/10

🧠

From Text Metrics to Model Internals: A Study of Whisper ASR Hallucination Detection

Researchers developed multiple approaches to detect hallucinations in OpenAI's Whisper ASR model, where the system generates fluent but unfounded transcriptions. The study found that probing the model's internal decoder states outperformed text-based and LLM-based detection methods, with a hybrid approach combining text metrics and internal representations achieving the best overall performance.

AINeutralarXiv – CS AI · Jun 236/10

🧠

MotionHalluc: Diagnosing Kinematic Hallucinations in Fine-Grained Motion Reasoning

Researchers introduce MotionHalluc, a benchmark dataset for evaluating how AI models hallucinate when analyzing motion differences between paired videos. The study reveals that large multimodal models struggle with directional, attributional, and temporal hallucinations in motion reasoning, but shows that injecting explicit kinematic measurements can improve accuracy by 10.6%.

AINeutralarXiv – CS AI · Jun 236/10

🧠

From RAG to Agentic RAG for Faithful Islamic Question Answering

Researchers introduced IslamicFaithQA, a 3,810-item bilingual benchmark and agentic RAG framework designed to improve the accuracy and reliability of Islamic question-answering systems. The work addresses critical gaps in LLM evaluation by measuring hallucination rates and abstention capabilities, achieving state-of-the-art performance through iterative evidence-seeking mechanisms grounded in Qur'anic text.

🏢 Hugging Face

AINeutralarXiv – CS AI · Jun 196/10

🧠

Trustworthy Multi-Agent Systems: Mitigating Semantic Drift with the Argent Signaling Protocol

Researchers introduce the Argent Signaling Protocol (ASP), a structured metadata framework that helps multi-agent AI systems distinguish between repairable failures and unrecoverable errors by tagging responses with quality signals including certainty, grounding, and stochasticity. Testing across multiple language models shows significant improvements in accuracy and error containment, with particular success in blocking ungrounded information from propagating through agent pipelines.

AIBullisharXiv – CS AI · Jun 106/10

🧠

Density Ridge Selective Prediction for LLM and VLM Hallucination Detection under Calibration Label Scarcity

Researchers propose a density ridge-based method for detecting hallucinations in large language and vision-language models that outperforms existing approaches by 5-20 AUROC points while requiring minimal calibration labels. The technique maps hidden state trajectories to a low-dimensional geometric skeleton, enabling robust hallucination detection even when training data is scarce.

AINeutralarXiv – CS AI · Jun 86/10

🧠

Evidence Graph Consistency in Retrieval-Augmented Generation: A Model-Dependent Analysis of Hallucination Detection

Researchers propose Evidence Graph Consistency (EGC), a framework to detect hallucinations in Retrieval-Augmented Generation systems by analyzing structural relationships among evidence pieces. Testing across six LLMs reveals a critical finding: the method works as expected for Llama-2 but shows reversed diagnostic signals for GPT-4, GPT-3.5, and Mistral-7B, suggesting hallucination patterns differ fundamentally across model families.

🧠 GPT-4🧠 Llama

AINeutralarXiv – CS AI · Jun 26/10

🧠

BenHalluEval: A Multi-Task Hallucination Evaluation Framework for Large Language Models on Bengali

Researchers introduce BenHalluEval, the first hallucination evaluation framework for Bengali-language LLMs, covering four task categories with 12,000 test cases across seven models. The framework reveals significant performance gaps and demonstrates that standard evaluation metrics fail to capture hallucination risks in low-resource languages.

🧠 GPT-5

AINeutralarXiv – CS AI · May 286/10

🧠

Entropy Distribution as a Fingerprint for Hallucinations in Generative Models

Researchers propose Calibrated Entropy Score (CES), a novel method for detecting hallucinations in large language models using entropy distribution patterns from a single forward pass. The technique achieves performance comparable to computationally expensive multi-sample methods while requiring only black-box access to token logits, with formal mathematical guarantees for detection accuracy.

🏢 Perplexity

AINeutralarXiv – CS AI · May 276/10

🧠

Automatic Layer Selection for Hallucination Detection

Researchers propose FEPoID, a training-free method for automatically selecting optimal layers in large language models to detect hallucinations. The approach outperforms existing criteria and baselines while introducing a truncation strategy that further enhances detection performance across question answering and summarization tasks.

AINeutralarXiv – CS AI · May 126/10

🧠

A Reflective Storytelling Agent for Older Adults: Integrating Argumentation Schemes and Argument Mining in LLM-Based Personalised Narratives

Researchers developed a reflective storytelling agent that combines large language models with knowledge graphs and argumentation theory to generate personalized narratives for older adults. Testing with 55 participants showed the system successfully identified personally relevant purposes in two-thirds of narratives, with argument-based grounding and hallucination detection significantly improving perceived consistency and clarity.

AIBullisharXiv – CS AI · May 126/10

🧠

Do Benchmarks Underestimate LLM Performance? Evaluating Hallucination Detection With LLM-First Human-Adjudicated Assessment

A new study challenges whether standard LLM benchmarks accurately measure hallucination detection performance. By having human adjudicators re-evaluate conflicting cases between original annotations and model predictions, researchers found that LLMs frequently made correct judgments that human annotators initially missed, suggesting single-pass human annotation may be insufficient for complex, ambiguous tasks.

🧠 GPT-5🧠 Gemini

← PrevPage 2 of 3Next →