AINeutralarXiv – CS AI · 14h ago7/10
🧠A comprehensive research study reveals that Retrieval-Augmented Generation (RAG) systems require context-aware deployment strategies rather than universal approaches. The analysis across multiple LLMs and datasets shows that RAG effectiveness depends heavily on task type, with optimal retrieval volumes and knowledge integration methods varying significantly between question answering and code generation applications.
AIBullisharXiv – CS AI · Apr 147/10
🧠Researchers introduce GRIP, a unified framework that integrates retrieval decisions directly into language model generation through control tokens, eliminating the need for external retrieval controllers. The system enables models to autonomously decide when to retrieve information, reformulate queries, and terminate retrieval within a single autoregressive process, achieving competitive performance with GPT-4o while using substantially fewer parameters.
🧠 GPT-4
AIBullisharXiv – CS AI · Apr 147/10
🧠Researchers introduce Disco-RAG, a discourse-aware framework that enhances Retrieval-Augmented Generation (RAG) systems by explicitly modeling discourse structures and rhetorical relationships between retrieved passages. The method achieves state-of-the-art results on question answering and summarization tasks without fine-tuning, demonstrating that structural understanding of text significantly improves LLM performance on knowledge-intensive tasks.
AIBullisharXiv – CS AI · Apr 77/10
🧠Researchers propose PassiveQA, a new AI framework that teaches language models to recognize when they don't have enough information to answer questions, choosing to ask for clarification or abstain rather than hallucinate responses. The three-action system (Answer, Ask, Abstain) uses supervised fine-tuning to align model behavior with information sufficiency, showing significant improvements in reducing hallucinations.
AIBullisharXiv – CS AI · Mar 177/10
🧠Researchers introduce A.DOT Planner, an AI framework that enables multi-hop question answering across hybrid data lakes containing both structured and unstructured data. The system uses directed acyclic graphs to orchestrate complex queries, achieving 14.8% better accuracy and 10.7% better completeness than existing solutions.
$DOT
AIBullisharXiv – CS AI · Mar 56/10
🧠Researchers demonstrate that coreference resolution significantly improves Retrieval-Augmented Generation (RAG) systems by reducing ambiguity in document retrieval and enhancing question-answering performance. The study finds that smaller language models benefit more from disambiguation processes, with mean pooling strategies showing superior context capturing after coreference resolution.
AIBearisharXiv – CS AI · Mar 56/10
🧠Researchers introduce ObfusQAte, a new framework to test Large Language Model robustness when faced with obfuscated or disguised factual questions. The study reveals that LLMs tend to fail or generate hallucinated responses when confronted with increasingly complex variations of questions across three dimensions of obfuscation.
AIBullisharXiv – CS AI · Mar 57/10
🧠Researchers developed a new training method combining Chain-of-Thought supervision with reinforcement learning to teach large language models when to abstain from answering temporal questions they're uncertain about. Their approach enabled a smaller Qwen2.5-1.5B model to outperform GPT-4o on temporal question answering tasks while improving reliability by 20% on unanswerable questions.
🧠 GPT-4
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers propose a neuro-symbolic framework for constructing knowledge graphs that combines LLM-based extraction with post-hoc ontology constraint validation, reducing token costs while improving consistency for complex question-answering tasks. The method defers corrections to after extraction rather than during it, enabling SQL-like querying capabilities for multi-hop reasoning across documents.
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers introduce GrepSeek, an AI search agent that interacts directly with text corpora using shell commands rather than traditional retrieval indexes. The system combines supervised learning with reinforcement optimization to achieve state-of-the-art results on question-answering benchmarks while operating at scale through parallel execution techniques.
AIBullisharXiv – CS AI · 3d ago6/10
🧠Researchers introduce CRITIC-R1, a structured framework that uses reinforcement learning to improve retrieval-augmented generation (RAG) systems by diagnosing and correcting errors in AI-generated answers. The approach outperforms existing RAG methods by providing fine-grained, multi-dimensional feedback rather than coarse corrections, addressing persistent hallucination and reasoning problems in knowledge-intensive question answering.
AINeutralarXiv – CS AI · 5d ago6/10
🧠Researchers introduce MiRD, a two-stage framework that improves reliable prediction for open-ended question answering by separately addressing sampling failures and selection errors. The approach maintains calibration-set integrity while controlling hallucinations in AI models, outperforming existing conformal prediction methods across multiple datasets and models.
AINeutralarXiv – CS AI · May 126/10
🧠PathISE is a novel framework that enables knowledge graph question-answering systems to learn effective supervision signals from answer-level labels alone, eliminating the need for expensive intermediate annotations. By using a transformer-based estimator to identify informative relation paths and distilling them into LLM path generators, the approach achieves competitive state-of-the-art performance while reducing resource requirements for training.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers introduce Sem-ECE, a new framework for evaluating how well large language models calibrate their confidence in open-ended question answering tasks. The method samples multiple answers from LLMs, groups them semantically, and uses answer frequency distributions as confidence measures, outperforming existing evaluation approaches across major commercial models.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers introduce HOME-KGQA, a new benchmark dataset for evaluating knowledge graph question answering systems on household activities using multimodal data. The dataset reveals significant performance gaps in current LLM-based KGQA methods, highlighting critical challenges for real-world deployment of AI systems that combine language models with structured knowledge.
AINeutralarXiv – CS AI · May 116/10
🧠A new survey examines how Large Language Models are transforming time series analysis by shifting from traditional task-specific forecasting toward a unified question-answering framework. The research proposes three alignment paradigms to bridge the gap between LLM capabilities and temporal data analysis, offering practical guidance for selecting appropriate methodologies across domains.
AINeutralarXiv – CS AI · May 16/10
🧠Researchers introduce TopBench, a benchmark dataset of 779 samples designed to evaluate how well Large Language Models handle implicit prediction tasks over tabular data—queries requiring inference from historical patterns rather than simple data retrieval. Testing reveals current LLMs struggle with intent recognition and default to lookup-based approaches, indicating that accurate intent disambiguation is critical before predictive reasoning can succeed.
AINeutralarXiv – CS AI · Apr 206/10
🧠Researchers present Deliberative Searcher, a framework that enhances large language model reliability by combining certainty calibration with retrieval-based search for question answering. The system uses reinforcement learning with soft reliability constraints to improve alignment between model confidence and actual correctness, producing more trustworthy outputs.
AINeutralarXiv – CS AI · Apr 146/10
🧠Researchers introduce CFMS, a two-stage framework combining multimodal large language models with symbolic reasoning to improve tabular data comprehension for question answering and fact verification tasks. The approach achieves competitive results on WikiTQ and TabFact benchmarks while demonstrating particular robustness with large tables and smaller model architectures.
AIBullisharXiv – CS AI · Apr 76/10
🧠Researchers introduced GroundedKG-RAG, a new retrieval-augmented generation system that creates knowledge graphs directly grounded in source documents to improve long-document question answering. The system reduces resource consumption and hallucinations while maintaining accuracy comparable to state-of-the-art models at lower cost.
AIBullisharXiv – CS AI · Mar 266/10
🧠Researchers propose MixDemo, a new GraphRAG framework that uses a Mixture-of-Experts mechanism to select high-quality demonstrations for improving large language model performance in domain-specific question answering. The framework includes a query-specific graph encoder to reduce noise in retrieved subgraphs and significantly outperforms existing methods across multiple textual graph benchmarks.
AIBullisharXiv – CS AI · Mar 176/10
🧠GlobalRAG is a new reinforcement learning framework that significantly improves multi-hop question answering by decomposing questions into subgoals and coordinating retrieval with reasoning. The system achieves 14.2% average improvements in performance metrics while using only 42% of the training data required by baseline models.
AIBullisharXiv – CS AI · Mar 116/10
🧠Researchers propose EvalAct, a new method that improves retrieval-augmented AI agents by converting retrieval quality assessment into explicit actions and using Process-Calibrated Advantage Rescaling (PCAR) for optimization. The approach shows superior performance on multi-step reasoning tasks across seven open-domain QA benchmarks by providing better process-level feedback signals.
AIBullisharXiv – CS AI · Mar 116/10
🧠Researchers propose TaSR-RAG, a new framework that improves Retrieval-Augmented Generation systems by using taxonomy-guided structured reasoning for better evidence selection. The system decomposes complex questions into triple sub-queries and performs step-wise evidence matching, achieving up to 14% performance improvements over existing RAG baselines on multi-hop question answering benchmarks.
AIBullisharXiv – CS AI · Mar 96/10
🧠Researchers introduced RAMoEA-QA, a new AI system that uses hierarchical specialization to answer questions about respiratory audio recordings from mobile devices. The system employs a two-stage routing approach with Audio Mixture-of-Experts and Language Mixture-of-Adapters to handle diverse recording conditions and query types, achieving 0.72 test accuracy compared to 0.61-0.67 for existing baselines.