#ai-reliability News & Analysis

255 articles tagged with #ai-reliability. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

255 articles

AINeutralarXiv – CS AI · Jun 96/10

🧠

The Token Not Taken: Sampling, State, and the Variability of AI Agent Outputs

A new arXiv paper analyzes the sources of variability in agentic AI systems, distinguishing between token-sampling randomness intrinsic to foundation models and external factors like environmental changes and infrastructure effects. The research clarifies when AI agent outputs are genuinely stochastic versus reproducible, with implications for understanding AI reliability in production deployments.

AINeutralarXiv – CS AI · Jun 96/10

🧠

REFLECT: Intervention-Supported Error Attribution for Silent Failures in LLM Agent Traces

REFLECT is a new method for identifying errors in long reasoning traces produced by LLM agents, particularly addressing the challenging "silent failure" problem where outputs appear plausible but are incorrect. The approach improves upon existing error-localization techniques by using controlled replay and contrastive evidence to refine error attribution, achieving higher accuracy across multiple benchmarks without requiring ground-truth answers.

AIBearisharXiv – CS AI · Jun 96/10

🧠

Evaluating Hallucinations in Domain-Adapted Large Language Models

Researchers investigating hallucinations in fine-tuned Large Language Models found that domain adaptation via fine-tuning alone is insufficient to prevent inaccurate outputs. Testing Llama-2 with domain-specific data revealed the model struggles with novel reasoning tasks and tends to over-generate information, highlighting fundamental limitations in current LLM adaptation techniques.

🧠 Llama

AINeutralarXiv – CS AI · Jun 96/10

🧠

Principled Agent Debate: Adversarial Arbitration for Sycophancy Reduction in Large Language Models

Researchers present Principled Agent Debate (PAD), a multi-agent architecture that reduces sycophancy in large language models by having two models with opposing dispositions argue positions while a blind arbitrator evaluates them. Testing on 200 questions shows PAD variants achieve 48.5-53% accuracy compared to 18.5% for single models, significantly improving truthfulness over agreement bias.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Training-Inference Kernel Contracts: Bounding Divergence in Post-Training and Deployment

Researchers propose 'kernel contracts,' a framework for managing divergence between training and inference implementations of AI models that operate at different precision levels. The work formalizes how finite-precision optimizations can produce different outputs at identical weights and provides mathematical bounds on resulting policy drift, with implications for reliable AI deployment.

AINeutralarXiv – CS AI · Jun 96/10

🧠

The ACUTE Protocol: Operationalizing Language Model Activations for Better Calibration, Utility, and Trust

Researchers introduce ACUTE, a protocol that uses language model activations to improve confidence calibration and trustworthiness across multiple LLM tasks. The approach balances calibration accuracy with informativeness through a new EURO metric, addressing the persistent problem of overconfident AI systems.

AIBearisharXiv – CS AI · Jun 96/10

🧠

GIScholarBench: Benchmarking LLM Overconfidence in GIS Research

Researchers introduced GIScholarBench, a benchmark testing whether large language models exhibit overconfidence when performing academic research tasks. Evaluating Claude, Gemini, and ChatGPT on 10,865 GIS papers, the study found all models generate confident outputs even when knowledge is incomplete, particularly in citation generation and research ideation tasks.

🧠 ChatGPT🧠 Claude🧠 Sonnet

AINeutralarXiv – CS AI · Jun 96/10

🧠

Constrained Paraphrase Consistency for LLM Hallucination Detection

Researchers introduce CCHD, a new hallucination detection method for large language models that uses paraphrase consistency constraints to improve factuality checking without expanding training datasets. The approach outperforms existing baselines like FactCG and MiniCheck while adding minimal computational overhead.

AIBullisharXiv – CS AI · Jun 96/10

🧠

Calibration of Structured Ignorance Certificates for Diagnosing Unknown Unknowns in Reasoning Models

Researchers introduce Structured Ignorance Certificates (SICs), a JSON-formatted output schema that trains language models to explicitly acknowledge knowledge gaps rather than hallucinate answers. The approach uses a novel 7,347-sample dataset of cross-domain questions and achieves 99.46% JSON validity with measurable improvements in epistemic awareness.

AIBullisharXiv – CS AI · Jun 86/10

🧠

AEGIS: A Backup Reflex for Physical AI

Researchers introduce AEGIS, a machine learning method that prevents robot manipulation failures by detecting high-risk steps and switching to a stronger policy only when needed. The system recovers 10.1% of failed trajectories while using stronger policies for just 38% of steps, demonstrating that selective escalation outperforms both blind backup policies and random triggering approaches.

AINeutralarXiv – CS AI · Jun 86/10

🧠

Evidence Graph Consistency in Retrieval-Augmented Generation: A Model-Dependent Analysis of Hallucination Detection

Researchers propose Evidence Graph Consistency (EGC), a framework to detect hallucinations in Retrieval-Augmented Generation systems by analyzing structural relationships among evidence pieces. Testing across six LLMs reveals a critical finding: the method works as expected for Llama-2 but shows reversed diagnostic signals for GPT-4, GPT-3.5, and Mistral-7B, suggesting hallucination patterns differ fundamentally across model families.

🧠 GPT-4🧠 Llama

AIBullisharXiv – CS AI · Jun 86/10

🧠

Enhancing Video Representations with Spatiotemporal-Semantic Residual to Mitigate Hallucinations in Video Large Multimodal Models

Researchers introduce ViSSRes, an inference-time intervention method that reduces hallucinations in Video Large Multimodal Models by enhancing video representations through a lightweight MLP network. The approach achieves a 40.69% reduction in hallucination rates on LLaVA-NeXT-Video while improving video understanding by 18.36%, with minimal computational overhead during inference.

AIBullishGoogle Research Blog · Jun 56/10

🧠

Unlocking dependable responses with Gemini Enterprise Agent Platform’s Agentic RAG

Google has introduced Agentic RAG capabilities within its Gemini Enterprise Agent Platform, designed to improve the reliability of AI-generated responses through retrieval-augmented generation techniques. This advancement addresses a critical challenge in enterprise AI deployment: reducing hallucinations and ensuring responses are grounded in accurate, up-to-date data sources.

🧠 Gemini

AIBullisharXiv – CS AI · Jun 56/10

🧠

ToolChoiceConfusion: Causal Minimal Tool Filtering for Reliable LLM Agents

Researchers propose Causal Minimal Tool Filtering (CMTF), a training-free method that improves LLM agent reliability by exposing only necessary tools at each step rather than entire tool menus. The approach reduces token usage by 90% and tool exposure from 100 to 1 per step while maintaining task success rates.

AINeutralarXiv – CS AI · Jun 56/10

🧠

Safe Embodied AI for Long-horizon Tasks: A Cross-layer Analysis of Robotic Manipulation

A comprehensive survey examines safety mechanisms for embodied AI systems performing long-horizon robotic manipulation tasks, identifying critical gaps in current research across planning, policy design, and execution phases. The analysis reveals that while safety receives attention, evidence remains fragmented with limited formal guarantees, particularly for contact-rich manipulation scenarios in real-world deployment.

AINeutralarXiv – CS AI · Jun 36/10

🧠

Uncertainty-Aware Clarification in LLM Agents with Information Gain

Researchers propose an uncertainty-aware clarification framework for LLM agents that uses Information Gain Rewards to optimize clarification questions when user instructions are ambiguous. The method improves task success rates by 3.7% while minimally increasing interaction steps, addressing a critical limitation in autonomous AI systems operating under incomplete information.