y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#hallucination-detection News & Analysis

44 articles tagged with #hallucination-detection. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

44 articles
AIBullisharXiv – CS AI · 14h ago7/10
🧠

TriEval: A Resource-Efficient Pipeline for LLM Bias, Toxicity, and Truthfulness Assessment

TriEval introduces an open-source pipeline for evaluating large language models across bias, toxicity, and truthfulness simultaneously while requiring minimal computational resources. The tool runs on standard laptops without GPU clusters, making rigorous LLM safety testing accessible to researchers with limited budgets, and reveals significant performance differences between open-source and closed-source models.

🧠 Claude🧠 Llama
AIBullisharXiv – CS AI · 1d ago7/10
🧠

TriLens: Per-Layer Logit-Lens Entropy for White-Box Hallucination Detection

TriLens is a novel white-box detection method that identifies hallucinations in language models by tracking entropy changes across internal computational layers. Rather than examining only final outputs, the technique monitors uncertainty signals from multi-head attention, feed-forward networks, and residual streams using logit lens analysis, creating a compact 3L-dimensional trajectory that reveals how model confidence settles during inference.

AIBullisharXiv – CS AI · 1d ago7/10
🧠

Detect Before You Leap: Mirage Detection in Vision-Language Models

Researchers have developed TC-LIA, a model-agnostic detection method that identifies when Vision-Language Models produce confident but visually ungrounded answers—a failure mode called 'mirage.' The technique achieves 94.6-94.7% accuracy in detecting these hallucinations across multiple VLM architectures, reducing mirage rates from 21.7-66.6% to below 3%, with significant implications for medical and document-based AI systems where false confidence poses safety risks.

AIBullisharXiv – CS AI · 2d ago7/10
🧠

LLM-FACETS: A Privacy-Preserving Framework for Evaluating LLM Transparency and Accountability

Researchers introduce LLM-FACETS, an open-source framework designed to make LLM auditing accessible to non-technical practitioners while preserving data privacy. The system addresses regulatory compliance needs outlined in the EU AI Act and NIST frameworks by providing browser-based evaluation tools that keep sensitive data on self-hosted servers rather than transmitting it to external services.

AIBullisharXiv – CS AI · 2d ago7/10
🧠

From Out-of-Distribution Detection to Hallucination Detection: A Geometric View

Researchers propose treating hallucination detection in large language models as an out-of-distribution (OOD) detection problem, leveraging computer vision techniques to create training-free detectors. This geometric approach shows strong performance on reasoning tasks where existing methods struggle, offering a scalable pathway to improve LLM safety and reliability.

AIBullisharXiv – CS AI · 5d ago7/10
🧠

Hallucination Detection-Guided Preference Optimization for Clinical Summarization

Researchers introduce HDPO, a method that uses hallucination detectors to guide iterative refinement of AI-generated clinical summaries, reducing factual errors by up to 48% in large language models. The approach combines inference-time detection with preference learning for model finetuning, demonstrating significant improvements in factual accuracy while maintaining summary quality for healthcare applications.

🧠 Llama
AIBearisharXiv – CS AI · 6d ago7/10
🧠

Relevant Is Not Warranted: Evidence-Force Calibration for Cited RAG

Researchers identify a critical failure mode in Retrieval-Augmented Generation (RAG) evaluation called 'citation laundering,' where topically relevant sources are presented as evidence for claims they don't actually support. The team introduces FORCEBENCH, a diagnostic benchmark that tests whether AI evaluators can distinguish between evidence-calibrated claims and over-generalized ones, revealing that current evaluation methods fail to detect warrant mismatches in 24-47% of cases.

AINeutralarXiv – CS AI · May 277/10
🧠

Beyond Final Answers: Auditing Trajectory-Level Hallucinations in Multi-Agent Industrial Workflows

Researchers introduce Trajel, a dataset and evaluation framework for detecting hallucinations in multi-step LLM agent workflows, revealing that existing benchmarks miss intermediate failures. The framework defines five hallucination types and shows that trajectory-level detection outperforms traditional post-hoc verification, highlighting critical gaps in current AI safety evaluation methodologies.

AIBullisharXiv – CS AI · May 277/10
🧠

ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence

ScientistOne introduces Chain-of-Evidence, a verifiability framework addressing critical failures in autonomous research systems where AI agents produce plausible-looking but unreliable outputs including fabricated citations, unverified scores, and misaligned methods. The system achieves zero hallucinated references and perfect score verification across five research tasks, significantly outperforming existing baseline systems that exhibit systematic failure rates up to 80%.

AIBullisharXiv – CS AI · May 277/10
🧠

Neuro-Symbolic Verification of LLM Outputs for Data-Sensitive Domains (extended preprint)

Researchers present a hybrid neuro-symbolic architecture that combines formal logic with neural semantic analysis to verify LLM outputs in high-stakes domains like healthcare. The system achieves over 83% hallucination detection rates for structured data and 72% for semantic fabrications while reducing report creation time by 30%, demonstrating practical safeguards for deploying LLMs in data-sensitive applications.

AINeutralarXiv – CS AI · May 277/10
🧠

QUACK: Questioning, Understanding, and Auditing Communicated Knowledge in Multimodal Social Deduction Agents

Researchers introduce QUACK, an evaluation framework for auditing whether AI agents in social deduction games actually ground their language in perceived reality or hallucinate claims. Testing three frontier vision-language models reveals that even top performers hallucinate 15% of spatial claims and make accusations without evidence, exposing critical gaps in agent reasoning reliability.

AIBearisharXiv – CS AI · May 277/10
🧠

The Attribution Blind Spot: Detecting When Language Models Rely on Memory Rather Than Retrieved Context

Researchers identify a critical vulnerability in retrieval-augmented generation systems where language models produce faithful-looking outputs from memory rather than retrieved context, making it impossible to verify source attribution through output analysis alone. They propose Computational Reality Monitoring (CRM), a technique that detects internal representational differences to identify when models rely on pretraining data versus external evidence.

AINeutralarXiv – CS AI · May 127/10
🧠

Delulu: A Verified Multi-Lingual Benchmark for Code Hallucination Detection in Fill-in-the-Middle Tasks

Microsoft researchers released Delulu, a benchmark dataset containing 1,951 code generation samples across 7 programming languages designed to test how well large language models detect hallucinations in Fill-in-the-Middle tasks. Testing 11 open-weight models revealed fundamental limitations, with even the strongest achieving only 84.5% accuracy, indicating that code hallucination remains a persistent challenge across all model families.

AINeutralarXiv – CS AI · May 127/10
🧠

Sanity Checks for Long-Form Hallucination Detection

Researchers introduce a controlled-invariance methodology to distinguish whether hallucination detection in large language models actually evaluates reasoning quality or merely exploits surface-level answer cues. Their lightweight TRACT model demonstrates that effective detection relies primarily on lexical trajectory features rather than complex learned representations, suggesting current detection methods conflate endpoint artifacts with genuine reasoning validation.

AINeutralarXiv – CS AI · May 97/10
🧠

Attractor Geometry of Transformer Memory: From Conflict Arbitration to Confident Hallucination

Researchers have identified a geometric framework explaining how language models fail through two distinct mechanisms: parametric memory conflicting with working memory, and hallucination from absent learned facts. Both failures produce confident outputs despite being mechanistically different, but hidden-state geometry and 'geometric margin' metrics can distinguish them more reliably than traditional entropy-based detection methods.

AIBullisharXiv – CS AI · May 77/10
🧠

Gradients with Respect to Semantics Preserving Embeddings Tell the Uncertainty of Large Language Models

Researchers introduce SemGrad, a gradient-based uncertainty quantification method for large language models that operates in semantic space rather than parameter space, eliminating the computational overhead of sampling-based approaches. The method measures output stability under semantically equivalent input perturbations to gauge LLM confidence, addressing the critical challenge of hallucinations in free-form text generation.

AIBullisharXiv – CS AI · Apr 207/10
🧠

Learning Uncertainty from Sequential Internal Dispersion in Large Language Models

Researchers introduce Sequential Internal Variance Representation (SIVR), a novel supervised framework for detecting hallucinations in large language models by analyzing token-wise and layer-wise variance patterns in hidden states. The method demonstrates superior generalization compared to existing approaches while requiring smaller training datasets, potentially enabling practical deployment of hallucination detection systems.

AINeutralarXiv – CS AI · Apr 157/10
🧠

Benchmarking Deflection and Hallucination in Large Vision-Language Models

Researchers introduce VLM-DeflectionBench, a new benchmark with 2,775 samples designed to evaluate how large vision-language models handle conflicting or insufficient evidence. The study reveals that most state-of-the-art LVLMs fail to appropriately deflect when faced with noisy or misleading information, highlighting critical gaps in model reliability for knowledge-intensive tasks.

AINeutralarXiv – CS AI · Apr 107/10
🧠

Blending Human and LLM Expertise to Detect Hallucinations and Omissions in Mental Health Chatbot Responses

Researchers demonstrate that standard LLM-as-a-judge methods achieve only 52% accuracy in detecting hallucinations and omissions in mental health chatbots, failing in high-risk healthcare contexts. A hybrid framework combining human domain expertise with machine learning features achieves significantly higher performance (0.717-0.849 F1 scores), suggesting that transparent, interpretable approaches outperform black-box LLM evaluation in safety-critical applications.

AIBullisharXiv – CS AI · Apr 107/10
🧠

Weakly Supervised Distillation of Hallucination Signals into Transformer Representations

Researchers developed a weak supervision framework to detect hallucinations in large language models by distilling grounding signals into transformer representations during training. Using substring matching, sentence embeddings, and LLM judges, they created a 15,000-sample dataset and trained five probing classifiers that achieve hallucination detection from internal activations alone at inference time, eliminating the need for external verification systems.

AIBearisharXiv – CS AI · Apr 77/10
🧠

Structural Rigidity and the 57-Token Predictive Window: A Physical Framework for Inference-Layer Governability in Large Language Models

Researchers present a new framework for AI safety that identifies a 57-token predictive window for detecting potential failures in large language models. The study found that only one out of seven tested models showed predictive signals before committing to problematic outputs, while factual hallucinations produced no detectable warning signs.

AIBullisharXiv – CS AI · Apr 77/10
🧠

Evolutionary Search for Automated Design of Uncertainty Quantification Methods

Researchers developed an LLM-powered evolutionary search method to automatically design uncertainty quantification systems for large language models, achieving up to 6.7% improvement in performance over manual designs. The study found that different AI models employ distinct evolutionary strategies, with some favoring complex linear estimators while others prefer simpler positional weighting approaches.

🧠 Claude🧠 Sonnet🧠 Opus
AIBullisharXiv – CS AI · Mar 267/10
🧠

SCoOP: Semantic Consistent Opinion Pooling for Uncertainty Quantification in Multiple Vision-Language Model Systems

Researchers developed SCoOP, a training-free framework that combines multiple Vision-Language Models to improve uncertainty quantification and reduce hallucinations in AI systems. The method achieves 10-13% better hallucination detection performance compared to existing approaches while adding only microsecond-level overhead to processing time.

AI × CryptoNeutralarXiv – CS AI · Mar 127/10
🤖

Tool Receipts, Not Zero-Knowledge Proofs: Practical Hallucination Detection for AI Agents

Researchers propose NabaOS, a lightweight verification framework that detects AI agent hallucinations using HMAC-signed tool receipts instead of zero-knowledge proofs. The system achieves 94.2% detection accuracy with <15ms verification time, compared to cryptographic approaches that require 180+ seconds per query.

Page 1 of 2Next →