AIBullisharXiv – CS AI · Jun 57/10
🧠FIDES is a training-free decoder that improves how language models handle conflicts between retrieved evidence and internal knowledge by applying selective, token-level corrections rather than uniform adjustments. The method achieves up to 92-94% context fidelity across multiple model scales, demonstrating that targeted intervention at critical decoding points outperforms existing contrastive decoding approaches.
AIBearisharXiv – CS AI · May 297/10
🧠A comprehensive audit of three major AI models reveals that personalized user contexts significantly reshape brand recommendations in commercial AI assistants, with mid-market brands experiencing up to 75% recommendation volatility while category leaders maintain 80% consistency across personas. The study demonstrates that AI recommendation bias is strongly correlated with model architecture and retrieval strategies, with implications for fair evaluation and brand perception measurement.
🏢 OpenAI🏢 Anthropic
AINeutralarXiv – CS AI · Apr 157/10
🧠Researchers introduce VLM-DeflectionBench, a new benchmark with 2,775 samples designed to evaluate how large vision-language models handle conflicting or insufficient evidence. The study reveals that most state-of-the-art LVLMs fail to appropriately deflect when faced with noisy or misleading information, highlighting critical gaps in model reliability for knowledge-intensive tasks.
AINeutralarXiv – CS AI · 5d ago6/10
🧠Researchers introduce CoVER, a new framework for Video Large Language Models that improves long-video understanding by gathering multiple search queries for visual evidence and using answer-specific visual feedback for verification. The approach demonstrates superior performance compared to similarly-sized models and some closed-source alternatives.
AINeutralarXiv – CS AI · Jun 36/10
🧠Traj-Evolve introduces a self-evolving multi-agent system that models patient trajectories from longitudinal electronic health records for lung cancer early detection. The system combines an Experience Pool for retrieval-augmented few-shot learning with multi-agent reinforcement learning to optimize collaboration, outperforming nine baselines on both general and never-smoker populations.
AIBullisharXiv – CS AI · Jun 26/10
🧠Researchers introduce Critic-R, a framework that improves agentic search systems by creating a feedback loop between reasoning agents and retrieval models. The approach uses a critic model to evaluate whether retrieved context supports reasoning steps and includes two mechanisms: Critic-R-Zero for query refinement at inference time, and Critic-Embed for training retrievers without manual annotations, demonstrating significant improvements on multi-hop question-answering benchmarks.
AINeutralarXiv – CS AI · Apr 146/10
🧠Researchers propose Video Retrieval Augmented Generation (VRAG) to address fundamental challenges in interactive world models for long-form video generation, specifically tackling compounding errors and spatiotemporal incoherence. The work establishes that autoregressive video generation inherently struggles with error accumulation, while explicit global state conditioning significantly improves long-term consistency and interactive planning capabilities.
AINeutralarXiv – CS AI · Mar 116/10
🧠Researchers developed Budget-Constrained Agentic Search (BCAS) to evaluate how search depth, retrieval strategies, and token budgets affect accuracy and cost in AI search systems. The study found that hybrid retrieval methods with lightweight re-ranking produce the largest gains, with accuracy improving up to a small cap of additional searches.
AIBullisharXiv – CS AI · Mar 37/107
🧠Researchers introduce MERA (Multimodal Mixture-of-Experts with Retrieval Augmentation), a new AI framework for protein active site identification that addresses challenges in drug discovery. The system achieves 90% AUPRC performance on active site prediction through hierarchical multi-expert retrieval and reliability-aware fusion strategies.