13 articles tagged with #multi-hop-reasoning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv โ CS AI ยท 4d ago7/10
๐ง Researchers introduce reasoning graphs, a persistent knowledge structure that improves language model reasoning accuracy by storing and reusing chains of thought tied to evidence items. The system achieves 47% error reduction on multi-hop questions and maintains deterministic outputs without model retraining, using only context engineering.
AIBullisharXiv โ CS AI ยท Mar 177/10
๐ง Researchers introduce APEX-Searcher, a new framework that enhances large language models' search capabilities through a two-stage approach combining reinforcement learning for strategic planning and supervised fine-tuning for execution. The system addresses limitations in multi-hop question answering by decoupling retrieval processes into planning and execution phases, showing significant improvements across multiple benchmarks.
AIBullisharXiv โ CS AI ยท Mar 167/10
๐ง Researchers propose Budget-Aware Value Tree (BAVT), a training-free framework that improves LLM agent efficiency by intelligently managing computational resources during multi-hop reasoning tasks. The system outperforms traditional approaches while using 4x fewer resources, demonstrating that smart budget management beats brute-force compute scaling.
AIBullisharXiv โ CS AI ยท Mar 57/10
๐ง Researchers developed a new AI training method using knowledge graphs as reward models to improve compositional reasoning in specialized domains. The approach enables smaller 14B parameter models to outperform much larger frontier systems like GPT-5.2 and Gemini 3 Pro on complex multi-hop reasoning tasks in medicine.
๐ง Gemini
AIBullisharXiv โ CS AI ยท 4d ago6/10
๐ง Researchers introduce KG-Reasoner, an end-to-end framework that uses reinforcement learning to train large language models to perform multi-hop reasoning over knowledge graphs without decomposing tasks into isolated pipeline steps. The approach demonstrates competitive or superior performance across eight reasoning benchmarks by enabling LLMs to dynamically explore reasoning paths and backtrack when necessary.
AINeutralarXiv โ CS AI ยท 4d ago6/10
๐ง Researchers propose a graph-based soft prompting framework that enables LLMs to reason over incomplete knowledge graphs by processing subgraph structures rather than explicit node paths, achieving state-of-the-art results on multi-hop question-answering benchmarks while reducing computational costs through a two-stage inference approach.
AINeutralarXiv โ CS AI ยท 5d ago6/10
๐ง Researchers demonstrate a zero-shot knowledge graph construction pipeline using local open-source LLMs on consumer hardware, achieving 0.70 F1 on document relations and 0.55 exact match on multi-hop reasoning through ensemble methods. The study reveals that strong model consensus often signals collective hallucination rather than accuracy, challenging traditional ensemble assumptions while maintaining low computational costs and carbon footprint.
AIBullisharXiv โ CS AI ยท Mar 276/10
๐ง Researchers have developed UniAI-GraphRAG, an enhanced framework that improves upon existing GraphRAG systems for complex reasoning and multi-hop queries. The framework introduces three key innovations including ontology-guided extraction, multi-dimensional clustering, and dual-channel fusion, showing superior performance over mainstream solutions like LightRAG on benchmark tests.
AIBullisharXiv โ CS AI ยท Mar 116/10
๐ง Researchers propose EvalAct, a new method that improves retrieval-augmented AI agents by converting retrieval quality assessment into explicit actions and using Process-Calibrated Advantage Rescaling (PCAR) for optimization. The approach shows superior performance on multi-step reasoning tasks across seven open-domain QA benchmarks by providing better process-level feedback signals.
AIBullisharXiv โ CS AI ยท Mar 116/10
๐ง Researchers propose TaSR-RAG, a new framework that improves Retrieval-Augmented Generation systems by using taxonomy-guided structured reasoning for better evidence selection. The system decomposes complex questions into triple sub-queries and performs step-wise evidence matching, achieving up to 14% performance improvements over existing RAG baselines on multi-hop question answering benchmarks.
AIBullisharXiv โ CS AI ยท Mar 36/103
๐ง Researchers introduce ReMemR1, a new approach to improve large language models' ability to handle long-context question answering by integrating memory retrieval into the memory update process. The system enables non-linear reasoning through selective callback of historical memories and uses multi-level reward design to strengthen training.
AINeutralarXiv โ CS AI ยท Feb 276/107
๐ง Researchers introduce SPARTA, an automated framework for generating large-scale Table-Text question answering benchmarks that require complex multi-hop reasoning across structured and unstructured data. The benchmark exposes significant weaknesses in current AI models, with state-of-the-art systems experiencing over 30 F1 point performance drops compared to existing simpler datasets.
AIBullisharXiv โ CS AI ยท Feb 276/107
๐ง Researchers introduce RELOOP, a new retrieval-augmented generation framework that improves multi-step question answering across text, tables, and knowledge graphs. The system uses hierarchical sequences and structure-aware iteration to achieve better accuracy while reducing computational costs compared to existing RAG methods.