#llm-systems News & Analysis

12 articles tagged with #llm-systems. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

12 articles

AINeutralarXiv – CS AI · Jun 27/10

🧠

Characterizing Web Search in The Age of Generative AI

Researchers systematically compared generative search systems (Google, OpenAI, Perplexity) with traditional Google search, revealing fundamental differences in retrieval strategies, source diversity, and output stability. Generative search synthesizes web information into coherent responses but exhibits significant variation in reliance on internal knowledge, consistency across executions, and evaluation metrics, necessitating new assessment frameworks.

🏢 OpenAI🏢 Perplexity

AIBullisharXiv – CS AI · Jun 17/10

🧠

SpecDB: LLM-Generated Customized Databases via Feature-Oriented Decomposition

SpecDB is an AI system that uses large language models to automatically generate customized relational databases tailored to specific workloads, rather than deploying uniform database systems across all use cases. The generated databases achieve comparable performance to PostgreSQL and MySQL while using only 3% of their code size, demonstrating the viability of AI-driven, purpose-built database synthesis.

AIBullisharXiv – CS AI · Jun 17/10

🧠

Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling

Researchers introduce agent just-in-time (JIT) compilation, a system that compiles natural language task descriptions directly into executable code for computer-use agents, achieving 10.4x speedup and 28% higher accuracy compared to existing sequential approaches. The method combines planning, scheduling, and tool protocol innovations to reduce latency and errors in browser automation tasks.

🏢 OpenAI

AIBullisharXiv – CS AI · May 97/10

🧠

From Agent Loops to Deterministic Graphs: Execution Lineage for Reproducible AI-Native Work

Researchers introduce execution lineage, a DAG-based execution model that makes AI-native workflows reproducible and maintainable by explicitly tracking dependencies and enabling identity-based replay. Tested against traditional loop-based approaches, the system demonstrated superior performance in preserving work integrity during updates while preventing unrelated context contamination.

AIBullisharXiv – CS AI · Apr 147/10

🧠

Context Kubernetes: Declarative Orchestration of Enterprise Knowledge for Agentic AI Systems

Researchers introduce Context Kubernetes, an architecture that applies container orchestration principles to managing enterprise knowledge in AI agent systems. The system addresses critical governance, freshness, and security challenges, demonstrating that without proper controls, AI agents leak data in over 26% of queries and serve stale content silently.

AIBullisharXiv – CS AI · Mar 37/104

🧠

Emergent Coordination in Multi-Agent Language Models

Researchers developed an information-theoretic framework to measure when multi-agent AI systems exhibit coordinated behavior beyond individual agents. The study found that specific prompt designs can transform collections of AI agents into coordinated collectives that mirror human group intelligence principles.

AINeutralarXiv – CS AI · 4d ago6/10

🧠

GLARE: A Natural Language Interface for Querying Global Explanations

Researchers introduce GLARE, an LLM-based interactive system that translates natural language questions into SQL queries to make global explanations from AI vision models more accessible and usable. The system bridges the gap between complex, static explanation artifacts and human-centered interpretability by enabling users to ask targeted questions about model behavior without needing technical expertise.

AINeutralarXiv – CS AI · Jun 116/10

🧠

Automated Mediator for Human Negotiation: Pre-Mediation via a Structured LLM Pipeline

Researchers developed an automated mediator using a structured LLM pipeline to support pre-mediation in human negotiations, decomposing the preparation process into specialized modules for dialogue, preference prediction, critique, and summarization. Human-subject experiments show the system achieves outcomes comparable to professional human mediators on self-reported measures while reducing preference-inference errors by 36%, suggesting scalable AI-assisted negotiation preparation is viable.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Auditable Graph-Guided Root Cause Analysis for Kubernetes Incidents

Researchers present Graph Traversal Agent, an LLM-based root cause analysis system for Kubernetes incidents that combines graph-guided reasoning with deterministic validation tools. The system demonstrates significant performance improvements on benchmarks but acknowledges limitations in production environments and benchmark-specific coupling.

AINeutralarXiv – CS AI · Jun 56/10

🧠

Executable Schema Contracts: From Automatic Ingestion to Multi-Source Retrieval

Researchers present an automated system that discovers executable schemas from multi-source, heterogeneous data and uses them as a unified contract for knowledge graph construction and intelligent query routing. The approach combines LLM-based schema discovery with deterministic structural analysis and demonstrates improved retrieval performance across four QA benchmarks compared to baseline methods.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Early Diagnosis of Wasted Computation in Multi-Agent LLM Systems via Failure-Aware Observability

Researchers introduce a failure-aware observability framework to diagnose wasted computation in multi-agent LLM systems, identifying six failure modes through online trace signals. Testing on 165 GAIA validation traces reveals 41% failure rates across difficulty levels and token consumption ranging from 8,152 to 16,389 tokens, positioning observability as a diagnostic layer between execution logs and accuracy.

AIBullisharXiv – CS AI · May 276/10

🧠

Natural Language Query to Configuration for Retrieval Agents

Researchers introduce BRANE, an AI system that dynamically selects optimal configurations for retrieval agents by analyzing natural-language queries at inference time. The method reduces serving costs by up to 89% while maintaining accuracy, demonstrating that per-query optimization outperforms traditional static pipeline tuning across multiple benchmarks.