y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#hallucination-detection News & Analysis

16 articles tagged with #hallucination-detection. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

16 articles
AINeutralarXiv โ€“ CS AI ยท 4d ago7/10
๐Ÿง 

Benchmarking Deflection and Hallucination in Large Vision-Language Models

Researchers introduce VLM-DeflectionBench, a new benchmark with 2,775 samples designed to evaluate how large vision-language models handle conflicting or insufficient evidence. The study reveals that most state-of-the-art LVLMs fail to appropriately deflect when faced with noisy or misleading information, highlighting critical gaps in model reliability for knowledge-intensive tasks.

AIBullisharXiv โ€“ CS AI ยท Apr 107/10
๐Ÿง 

Weakly Supervised Distillation of Hallucination Signals into Transformer Representations

Researchers developed a weak supervision framework to detect hallucinations in large language models by distilling grounding signals into transformer representations during training. Using substring matching, sentence embeddings, and LLM judges, they created a 15,000-sample dataset and trained five probing classifiers that achieve hallucination detection from internal activations alone at inference time, eliminating the need for external verification systems.

AINeutralarXiv โ€“ CS AI ยท Apr 107/10
๐Ÿง 

Blending Human and LLM Expertise to Detect Hallucinations and Omissions in Mental Health Chatbot Responses

Researchers demonstrate that standard LLM-as-a-judge methods achieve only 52% accuracy in detecting hallucinations and omissions in mental health chatbots, failing in high-risk healthcare contexts. A hybrid framework combining human domain expertise with machine learning features achieves significantly higher performance (0.717-0.849 F1 scores), suggesting that transparent, interpretable approaches outperform black-box LLM evaluation in safety-critical applications.

AIBearisharXiv โ€“ CS AI ยท Apr 77/10
๐Ÿง 

Structural Rigidity and the 57-Token Predictive Window: A Physical Framework for Inference-Layer Governability in Large Language Models

Researchers present a new framework for AI safety that identifies a 57-token predictive window for detecting potential failures in large language models. The study found that only one out of seven tested models showed predictive signals before committing to problematic outputs, while factual hallucinations produced no detectable warning signs.

AIBullisharXiv โ€“ CS AI ยท Apr 77/10
๐Ÿง 

Evolutionary Search for Automated Design of Uncertainty Quantification Methods

Researchers developed an LLM-powered evolutionary search method to automatically design uncertainty quantification systems for large language models, achieving up to 6.7% improvement in performance over manual designs. The study found that different AI models employ distinct evolutionary strategies, with some favoring complex linear estimators while others prefer simpler positional weighting approaches.

๐Ÿง  Claude๐Ÿง  Sonnet๐Ÿง  Opus
AIBullisharXiv โ€“ CS AI ยท Mar 267/10
๐Ÿง 

SCoOP: Semantic Consistent Opinion Pooling for Uncertainty Quantification in Multiple Vision-Language Model Systems

Researchers developed SCoOP, a training-free framework that combines multiple Vision-Language Models to improve uncertainty quantification and reduce hallucinations in AI systems. The method achieves 10-13% better hallucination detection performance compared to existing approaches while adding only microsecond-level overhead to processing time.

AI ร— CryptoNeutralarXiv โ€“ CS AI ยท Mar 127/10
๐Ÿค–

Tool Receipts, Not Zero-Knowledge Proofs: Practical Hallucination Detection for AI Agents

Researchers propose NabaOS, a lightweight verification framework that detects AI agent hallucinations using HMAC-signed tool receipts instead of zero-knowledge proofs. The system achieves 94.2% detection accuracy with <15ms verification time, compared to cryptographic approaches that require 180+ seconds per query.

AIBullisharXiv โ€“ CS AI ยท Mar 57/10
๐Ÿง 

Can a Small Model Learn to Look Before It Leaps? Dynamic Learning and Proactive Correction for Hallucination Detection

Researchers propose LEAP, a new framework for detecting AI hallucinations using efficient small models that can dynamically adapt verification strategies. The system uses a teacher-student approach where a powerful model trains smaller ones to detect false outputs, addressing a critical barrier to safe AI deployment in production environments.

AIBullisharXiv โ€“ CS AI ยท Mar 37/104
๐Ÿง 

HalluGuard: Demystifying Data-Driven and Reasoning-Driven Hallucinations in LLMs

Researchers introduce HalluGuard, a new framework that identifies and addresses both data-driven and reasoning-driven hallucinations in Large Language Models. The system achieved state-of-the-art performance across 10 benchmarks and 9 LLM backbones, offering a unified approach to improve AI reliability in critical domains like healthcare and law.

AINeutralarXiv โ€“ CS AI ยท Apr 106/10
๐Ÿง 

Hallucination as output-boundary misclassification: a composite abstention architecture for language models

Researchers propose a composite architecture combining instruction-based refusal with a structural abstention gate to reduce hallucinations in large language models. The system uses a support deficit score derived from self-consistency, paraphrase stability, and citation coverage to block unreliable outputs, achieving better accuracy than either mechanism alone across multiple models.

AIBullisharXiv โ€“ CS AI ยท Mar 266/10
๐Ÿง 

HalluJudge: A Reference-Free Hallucination Detection for Context Misalignment in Code Review Automation

Researchers developed HalluJudge, a reference-free system to detect hallucinations in AI-generated code review comments, addressing a key challenge in LLM adoption for software development. The system achieves 85% F1 score with 67% alignment to developer preferences at just $0.009 average cost, making it a practical safeguard for AI-assisted code reviews.

AINeutralarXiv โ€“ CS AI ยท Mar 126/10
๐Ÿง 

The System Hallucination Scale (SHS): A Minimal yet Effective Human-Centered Instrument for Evaluating Hallucination-Related Behavior in Large Language Models

Researchers have developed the System Hallucination Scale (SHS), a human-centered tool for evaluating hallucination behavior in large language models. The instrument showed strong statistical validity in testing with 210 participants and provides a practical method for assessing AI model reliability from a user perspective.

AIBullisharXiv โ€“ CS AI ยท Mar 36/102
๐Ÿง 

Spilled Energy in Large Language Models

Researchers developed a training-free method to detect AI hallucinations by reinterpreting LLM output as Energy-Based Models and tracking 'energy spills' during text generation. The approach successfully identifies factual errors and biases across multiple state-of-the-art models including LLaMA, Mistral, and Gemma without requiring additional training or probe classifiers.

AINeutralApple Machine Learning ยท Mar 35/103
๐Ÿง 

Learning to Reason for Hallucination Span Detection

Researchers are developing new methods to detect hallucinations in large language models by identifying specific spans of unsupported content rather than making binary decisions. The study evaluates Chain-of-Thought reasoning approaches to improve the complex multi-step process of hallucination span detection in LLMs.