AINeutralarXiv – CS AI · Jun 195/10
🧠Researchers propose an optimal scheduling system for question-answering forums staffed by paid knowledge workers rather than volunteers. The study calculates system capacity, designs efficient schedulers, and explores how expert collaboration can improve request-handling throughput.
AIBullisharXiv – CS AI · Jun 106/10
🧠Researchers propose SAFE, an LLM-as-verifier framework that improves multi-hop question answering by validating reasoning steps against evidence during generation rather than only checking final answers. The approach uses Knowledge Graph triples to decompose reasoning into verifiable units and achieves 8.8 percentage point accuracy improvements across three benchmarks.
AINeutralarXiv – CS AI · Jun 56/10
🧠A new research audit challenges the assumed benefits of LLM rewriters in retrieval-augmented QA systems, finding that performance gains stem primarily from the presence of gold answer strings in rewritten context rather than from genuine passage curation. The study introduces controlled intervention methods to test rewriter claims, revealing that conventional evaluation probes are sensitive to methodology choices and may report misleading results.
AIBullisharXiv – CS AI · Jun 26/10
🧠Researchers introduce Harness-1, a 20B parameter search agent that separates semantic decision-making from state management by externalizing working memory to a stateful harness environment. The system achieves 73% average curated recall across eight retrieval benchmarks, outperforming comparable open-source searchers by 11.4 points while generalizing well to held-out transfer tasks.
AINeutralarXiv – CS AI · May 296/10
🧠Researchers introduce a benchmark for evaluating how AI systems handle conflicting information across multiple memory sources, addressing a critical gap in testing personal AI agents. The study compares various approaches including fusion methods and LLMs, revealing that trained fusion models outperform prompt-based LLMs by 10+ percentage points on accuracy, with selective abstention improving performance further.
AIBullisharXiv – CS AI · May 126/10
🧠SearchSkill is a new framework that teaches language models to perform more effective web searches by explicitly planning queries through reusable skill cards rather than treating search as an undifferentiated action. The system maintains an evolving skill bank that improves from failure patterns, demonstrating better performance on knowledge-intensive QA tasks with fewer wasted queries and improved reasoning accuracy.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers introduce PiCA (Pivot-Based Credit Assignment), a novel reinforcement learning mechanism that improves how LLM-based search agents learn from long sequences of actions. By identifying key pivot steps and anchoring rewards to final task outcomes, PiCA addresses critical challenges in credit assignment, delivering 15.2% performance gains on knowledge-intensive QA tasks.
AINeutralarXiv – CS AI · May 126/10
🧠A new study compares Retrieval-Augmented Generation (RAG) and fine-tuning approaches for adapting Large Language Models to enterprise question-answering tasks in the automotive industry. The research finds that RAG offers superior cost-efficiency while maintaining comparable answer quality, even enabling open-source models to match premium model performance.