AIBullisharXiv – CS AI · 2d ago7/10
🧠OmniRetrieval is a new framework that enables unified retrieval across heterogeneous knowledge sources—including unstructured text, relational databases, knowledge graphs, and property graphs—by translating natural language queries into source-native queries rather than forcing all data into a homogenized format. The system demonstrates superior performance compared to single-source retrievers across 13 datasets and 309 knowledge bases, positioning it as a general-purpose interface that preserves the structural advantages of each knowledge source.
AIBullisharXiv – CS AI · 2d ago7/10
🧠Researchers introduced Compass, an LLM agent framework that extracts marine lead data from 230,000+ academic papers without fine-tuning, successfully creating the largest integrated marine lead database with 3,751 previously uncatalogued records and 92% accuracy. The expert-guided approach demonstrates how domain-specific knowledge can overcome LLM hallucinations in high-stakes scientific applications.
AIBullisharXiv – CS AI · 3d ago7/10
🧠FundaPod introduces a multi-persona AI agent platform designed to assist institutional investors in fundamental research by enabling independent agents with different investment perspectives to conduct analysis and surface disagreements for human portfolio manager review. The system uses knowledge graphs and grounded evidence models to create transparent, verifiable investment memos that prioritize human-centric decision-making over automated trading signals.
AIBullisharXiv – CS AI · 3d ago7/10
🧠Researchers demonstrate that knowledge graphs extracted from a single neuroscience textbook can be converted into high-quality training data to fine-tune language models, enabling expert-level reasoning that outperforms larger LLMs while using far fewer parameters. This approach challenges the prevailing assumption that domain expertise requires massive, diverse datasets, showing instead that structured, curated knowledge can produce superior specialized AI systems.
AIBullisharXiv – CS AI · 3d ago7/10
🧠FD-RAG introduces a federated framework for retrieval-augmented generation that enables decentralized LLM deployment across edge devices without centralizing sensitive data. The system achieves 7.8% accuracy improvements and 8.4x latency reductions by splitting lightweight memory access from expensive LLM reasoning, while aggregating anonymized knowledge across fragmented device networks.
AIBullisharXiv – CS AI · 4d ago7/10
🧠GraphDancer is a new post-training framework that enables large language models to reason over heterogeneous graph-structured data by combining natural-language reasoning with graph function execution. The two-stage curriculum approach uses structural complexity ordering to teach models to explore and reason over graphs, achieving strong cross-domain generalization with only a 3B parameter backbone.
AIBearisharXiv – CS AI · May 127/10
🧠Researchers demonstrate 'Oracle Poisoning,' a novel attack where adversaries corrupt knowledge graphs used by AI agents, causing them to reach incorrect conclusions through valid reasoning. Testing across nine models from three providers shows all models accept fabricated data at 100% under moderate attack sophistication, revealing a critical vulnerability in production-scale agentic systems that differs fundamentally from prompt injection attacks.
🧠 GPT-5
AIBullisharXiv – CS AI · May 127/10
🧠Researchers have developed EpiGraph, a comprehensive knowledge graph containing 24,324 entities and 32,009 evidence-grounded triplets from 48,166 peer-reviewed papers to improve AI-driven epilepsy diagnosis and treatment. The accompanying EpiBench benchmark demonstrates that integrating structured clinical knowledge into large language models significantly enhances clinical reasoning, with improvements up to 41% in pharmacogenomic applications.
AIBullisharXiv – CS AI · May 17/10
🧠Researchers introduce Intern-Atlas, a methodological evolution graph built from over 1 million AI papers that automatically maps how research methods develop and relate to one another. The infrastructure captures explicit causal relationships between methodologies and enables AI-driven research agents to reconstruct innovation timelines, addressing a critical gap in existing document-centric research systems.
AIBullisharXiv – CS AI · May 17/10
🧠Researchers introduce ObjectGraph (.og), a new file format designed specifically for how AI agents consume documents through retrieval rather than linear reading. The format reduces token consumption by up to 95.3% while maintaining task accuracy, addressing a fundamental architectural mismatch between traditional documents and LLM agent workflows.
AIBullisharXiv – CS AI · Apr 157/10
🧠Researchers introduce reasoning graphs, a persistent knowledge structure that improves language model reasoning accuracy by storing and reusing chains of thought tied to evidence items. The system achieves 47% error reduction on multi-hop questions and maintains deterministic outputs without model retraining, using only context engineering.
AINeutralarXiv – CS AI · Apr 157/10
🧠Researchers identified a critical failure mode in LLM-based agents called policy-invisible violations, where agents execute actions that appear compliant but breach organizational policies due to missing contextual information. They introduced PhantomPolicy, a benchmark with 600 test cases, and Sentinel, an enforcement framework using counterfactual graph simulation that achieved 93% accuracy in detecting violations compared to 68.8% for baseline approaches.
AIBullisharXiv – CS AI · Apr 147/10
🧠Researchers introduce AtlasKV, a parametric knowledge integration method that enables large language models to leverage billion-scale knowledge graphs while consuming less than 20GB of VRAM. Unlike traditional retrieval-augmented generation (RAG) approaches, AtlasKV integrates knowledge directly into LLM parameters without requiring external retrievers or extended context windows, reducing inference latency and computational overhead.
AINeutralarXiv – CS AI · Apr 147/10
🧠Researchers introduce PaperScope, a comprehensive benchmark for evaluating multi-modal AI systems on complex scientific research tasks across multiple documents. The benchmark reveals that even advanced systems like OpenAI Deep Research and Tongyi Deep Research struggle with long-context retrieval and cross-document reasoning, exposing significant gaps in current AI capabilities for scientific workflows.
🏢 OpenAI
AIBullisharXiv – CS AI · Apr 137/10
🧠Researchers introduce LOM-action, an enterprise AI system that grounds LLM-based decisions in business ontologies and event-driven simulations rather than unrestricted knowledge spaces. The approach achieves 93.82% accuracy with 98.74% F1 scores on decision chains, vastly outperforming larger models like DeepSeek-V3.2, while maintaining complete audit trails for enterprise compliance.
AINeutralarXiv – CS AI · Mar 267/10
🧠Researchers developed a graph-based evaluation framework that transforms clinical guidelines into dynamic benchmarks for testing domain-specific language models. The system addresses key evaluation challenges by providing contamination resistance, comprehensive coverage, and maintainable assessment tools that reveal systematic capability gaps in current AI models.
AIBullisharXiv – CS AI · Mar 267/10
🧠Researchers have developed AI-Supervisor, a multi-agent framework that maintains a persistent Research World Model to autonomously conduct end-to-end AI research supervision. Unlike traditional linear pipelines, the system uses specialized agents with structured gap discovery, self-correcting loops, and consensus mechanisms to continuously evolve research understanding.
AIBullisharXiv – CS AI · Mar 117/10
🧠Researchers introduce MMGraphRAG, a new AI framework that addresses hallucination issues in large language models by integrating visual scene graphs with text knowledge graphs through cross-modal fusion. The system uses SpecLink for entity linking and demonstrates superior performance in multimodal information processing across multiple benchmarks.
AIBullisharXiv – CS AI · Mar 56/10
🧠Researchers propose a dual-helix governance framework to address AI agent reliability issues in WebGIS development, implementing a 3-track architecture that achieved 51% reduction in code complexity. The framework uses knowledge graphs and self-learning cycles to overcome LLM limitations like context constraints and instruction failures.
AIBullisharXiv – CS AI · Mar 57/10
🧠Researchers introduce GraphMERT, an 80M-parameter AI model that efficiently extracts reliable knowledge graphs from unstructured text data. The system outperforms much larger language models like Qwen3-32B in generating factually accurate and semantically valid knowledge graphs, achieving 69.8% FActScore versus 40.2% for the baseline.
AIBullisharXiv – CS AI · Mar 57/10
🧠Researchers developed a new AI training method using knowledge graphs as reward models to improve compositional reasoning in specialized domains. The approach enables smaller 14B parameter models to outperform much larger frontier systems like GPT-5.2 and Gemini 3 Pro on complex multi-hop reasoning tasks in medicine.
🧠 Gemini
AIBullisharXiv – CS AI · Mar 56/10
🧠Researchers propose PlugMem, a task-agnostic plugin memory module for LLM agents that structures episodic memories into knowledge-centric graphs for efficient retrieval. The system consistently outperforms existing memory designs across multiple benchmarks while maintaining transferability between different tasks.
AIBullisharXiv – CS AI · Mar 47/103
🧠Researchers introduce MIRAGE, a novel AI framework that uses knowledge graphs and electronic health records to predict Alzheimer's disease when MRI scans are unavailable. The system improves AD classification rates by 13% compared to single-modality approaches by creating synthetic representations without expensive 3D brain scan reconstruction.
AIBullisharXiv – CS AI · Mar 47/103
🧠Researchers present Odin, the first production-deployed graph intelligence engine that autonomously discovers patterns in knowledge graphs without predefined queries. The system uses a novel COMPASS scoring metric combining structural, semantic, temporal, and community-aware signals, and has been successfully deployed in regulated healthcare and insurance environments.
AINeutralarXiv – CS AI · 2d ago6/10
🧠Researchers introduce RefWalk, a novel framework and RegOps-Bench benchmark for improving Large Language Model compliance with regulatory question-answering tasks. The system addresses critical gaps in citation traceability and attribution accuracy by traversing multi-document regulatory structures, enabling more reliable AI deployment in compliance-critical domains.