#ai-systems News & Analysis

53 articles tagged with #ai-systems. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

53 articles

AINeutralarXiv – CS AI · Jun 237/10

🧠

From Question Answering to Task Completion: A Survey on Agent System and Harness Design

A comprehensive survey examines LLM-based agent systems through a model-harness lens, arguing that agent performance depends on the interaction between foundation models, execution infrastructure, and task structure rather than model capabilities alone. The research identifies six core runtime responsibilities and maps how different harness configurations affect long-horizon task completion, efficiency, and reliability.

AIBullisharXiv – CS AI · Jun 237/10

🧠

FleetAgent: Teleoperation Assistant for Autonomous Fleets via Vectorized V2N Messages

FleetAgent is a cloud-based AI system that uses compact vectorized vehicle-to-network messages to assist remote operators in managing autonomous vehicle fleets. The system reduces data transmission costs by up to 625x compared to raw images while improving teleoperation monitoring accuracy and decision-making efficiency.

AIBullisharXiv – CS AI · Jun 117/10

🧠

LLMs+Graphs: Toward Graph-Native, Synergistic AI Systems

A research paper proposes synergistic AI systems that combine Large Language Models with graph computation and knowledge graphs to overcome LLMs' limitations in structured reasoning and multi-hop inference. The work outlines three complementary approaches: augmenting LLMs with graph computation, bidirectional integration between LLMs and knowledge graphs, and strengthening AI agents with graph algorithms for complex decision-making.

AIBullisharXiv – CS AI · Jun 107/10

🧠

Atomic Intent Reasoning: Bringing LLM Semantics to Industrial Cross-Domain Recommendations

Researchers introduce AIR (Atomic Intent Reasoning), an LLM-driven framework that enables cross-domain recommendations by moving language model inference offline and dynamically constructing user intents during online operations. The system achieves 400x inference acceleration while maintaining semantic understanding, with real-world testing at Kuaishou E-commerce showing a +3.446% GMV increase.

AINeutralarXiv – CS AI · Jun 57/10

🧠

Continual Learning Bench: Evaluating Frontier AI Systems in Real-World Stateful Environments

Researchers introduce Continual Learning Bench (CL-Bench), the first comprehensive benchmark for evaluating whether LLM-based AI systems genuinely improve through sequential experience across real-world domains. Testing frontier models reveals significant gaps in current continual learning capabilities, with systems frequently overfitting to immediate observations and failing to reuse knowledge effectively.

AIBullisharXiv – CS AI · Jun 47/10

🧠

Parthenon Law: A Self-Evolving Legal-Agent Framework

Researchers introduce Parthenon, a self-evolving legal-agent framework that addresses critical limitations in deploying AI agents for complex legal work. Through analysis of 12,510 agent trajectories, the study reveals that even frontier LLMs struggle with end-to-end legal task completion, prompting the development of a modular architecture that learns from failures without retraining underlying models.

AIBearisharXiv – CS AI · Jun 27/10

🧠

When Safe Skills Collide: Measuring Compositional Risk in Agent Skill Ecosystems

Researchers present SkillReact, a framework measuring compositional safety risks in LLM agent skill ecosystems, finding that 18.2% of individually-safe skill pairs create genuine safety vulnerabilities when combined—risks missed by per-skill scanning alone. Testing on 211,575 skill pairs from ClawHub reveals model-dependent execution risk, with smaller models like Haiku more likely to execute unsafe tool chains than larger models like Sonnet.

AIBullisharXiv – CS AI · Jun 27/10

🧠

Model-Native Computing Architecture: Envisioning Future System Architecture Through the Lens of Computer Architecture

Researchers propose the Intelligent Computing Architecture Model (ICAM), a six-layer framework that applies classical computer architecture principles to large language models and agentic AI systems. The paper maps recurring engineering challenges—cache reuse, context management, agent scheduling, and permission control—to traditional systems problems, introducing three design laws to optimize model-native computing efficiency and coordination.

🧠 Claude

AIBullisharXiv – CS AI · Jun 27/10

🧠

Science Earth: Towards A Planet-Scale Operating System for AI-Native Scientific Discovery

Researchers introduce Science Earth, a planet-scale operating system that enables diverse AI capabilities—from simulation clusters to wet-lab robots to proof engines—to autonomously discover, coordinate, and collaborate on scientific problems without pre-designed workflows. Two validation runs demonstrate the system successfully identifying theoretical gaps in mathematical models and generating novel insights from cancer cell data through distributed, self-correcting reasoning.

AIBullisharXiv – CS AI · Jun 17/10

🧠

GSAM: A Generalizable and Safe Robotic Framework for Articulated Object Manipulation

GSAM is a new robotic framework that improves articulated object manipulation through vision-based perception, VLM-based refinement with commonsense reasoning, and constraint-based planning to prevent collisions. In experiments across 50 hinge tasks, GSAM achieved 36% higher success rates and 3.1% lower standard deviation compared to existing baselines, demonstrating superior generalization and safety.

AIBullisharXiv – CS AI · May 297/10

🧠

VikingMem: A Memory Base Management System for Stateful LLM-based Applications

Researchers introduce VikingMem, a memory management system for long-term LLM interactions that addresses context window limitations through selective memory extraction, stateful evolution, and temporal weighting. The system demonstrates 30% improvements in memory retrieval effectiveness while maintaining low latency, offering a generalizable solution across diverse applications beyond traditional chatbots.

AIBullisharXiv – CS AI · May 297/10

🧠

Keep the Proof State Live: Snapshotting for Efficient Tactic Search in Lean 4

Researchers introduce proof-state snapshotting, a technique that accelerates automated theorem proving in Lean 4 by reusing elaborated proof states across parallel search branches instead of reconstructing them. The method achieves 5.6-50x speedups (averaging 14x) on benchmark problems, addressing a critical bottleneck where per-branch overhead from import loading and elaboration consumed over 99% of computation time.

AIBullisharXiv – CS AI · May 277/10

🧠

MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation

Researchers propose MUSE-Autoskill, a framework enabling LLM agents to autonomously create, store, and refine reusable skills throughout their operational lifecycle. The system treats skills as long-lived, testable assets with integrated memory and evaluation mechanisms, demonstrating improved task success rates and cross-agent knowledge transfer on benchmark tests.

AIBullisharXiv – CS AI · May 127/10

🧠

SkillEvolver: Skill Learning as a Meta-Skill

SkillEvolver introduces a meta-learning framework that automatically improves AI agent skills through iterative refinement based on real-world deployment failures, achieving 56.8% accuracy on benchmark tasks compared to 43.6% for manually curated skills. The system learns by modifying skill prose and code rather than model weights, enabling seamless integration with any compatible agent without retraining.

AINeutralarXiv – CS AI · May 97/10

🧠

Authorization Propagation in Multi-Agent AI Systems: Identity Governance as Infrastructure

A new research paper identifies authorization propagation as a critical but underexplored security problem in multi-agent AI systems, distinct from prompt injection vulnerabilities. The paper argues that identity governance must become foundational infrastructure in AI orchestration, with seven structural requirements for maintaining authorization invariants across distributed agent interactions.

AIBullisharXiv – CS AI · Apr 207/10

🧠

Bilevel Optimization of Agent Skills via Monte Carlo Tree Search

Researchers propose a bilevel optimization framework using Monte Carlo Tree Search to systematically improve LLM agent skills—structured collections of instructions, tools, and resources. The framework optimizes both skill structure and component content simultaneously, demonstrating performance improvements on Operations Research tasks and addressing a previously unsolved challenge in agent design optimization.

AINeutralarXiv – CS AI · Apr 147/10

🧠

From GPT-3 to GPT-5: Mapping their capabilities, scope, limitations, and consequences

A comprehensive comparative study traces the evolution of OpenAI's GPT models from GPT-3 through GPT-5, revealing that successive generations represent far more than incremental capability improvements. The research demonstrates a fundamental shift from simple text predictors to integrated, multimodal systems with tool access and workflow capabilities, while persistent limitations like hallucination and benchmark fragility remain largely unresolved across all versions.

🧠 GPT-4🧠 GPT-5

AIBullisharXiv – CS AI · Apr 67/10

🧠

Glia: A Human-Inspired AI for Automated Systems Design and Optimization

Researchers have developed Glia, an AI architecture using large language models in a multi-agent workflow to autonomously design computer systems mechanisms. The system generates interpretable designs for distributed GPU clusters that match human expert performance while providing novel insights into workload behavior.

AIBullisharXiv – CS AI · Mar 177/10

🧠

StatePlane: A Cognitive State Plane for Long-Horizon AI Systems Under Bounded Context

Researchers introduce StatePlane, a model-agnostic cognitive state management system that enables AI systems to maintain coherent reasoning over long interaction horizons without expanding context windows or retraining models. The system uses episodic, semantic, and procedural memory mechanisms inspired by cognitive psychology to overcome current limitations in large language models.

AIBullisharXiv – CS AI · Mar 127/10

🧠

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

Researchers developed KernelSkill, a multi-agent framework that optimizes GPU kernel performance using expert knowledge rather than trial-and-error approaches. The system achieved 100% success rates and significant speedups (1.92x to 5.44x) over existing methods, addressing a critical bottleneck in AI system efficiency.

AINeutralarXiv – CS AI · Mar 127/10

🧠

Defining AI Models and AI Systems: A Framework to Resolve the Boundary Problem

A comprehensive study analyzing 896 academic papers and 80+ regulatory documents reveals critical ambiguities in how 'AI models' and 'AI systems' are defined across regulations like the EU AI Act. The research proposes clear operational definitions to resolve regulatory boundary problems that complicate responsibility allocation across the AI value chain.

AIBullishOpenAI News · Mar 97/10

🧠

OpenAI to acquire Promptfoo

OpenAI is acquiring Promptfoo, an AI security platform that specializes in helping enterprises identify and fix vulnerabilities in AI systems during the development process. This acquisition strengthens OpenAI's security capabilities and enterprise offerings.

🏢 OpenAI

AIBullisharXiv – CS AI · Mar 67/10

🧠

AMV-L: Lifecycle-Managed Agent Memory for Tail-Latency Control in Long-Running LLM Systems

Researchers introduce AMV-L, a new memory management framework for long-running LLM systems that uses utility-based lifecycle management instead of traditional time-based retention. The system improves throughput by 3.1x and reduces latency by up to 4.7x while maintaining retrieval quality by controlling memory working-set size rather than just retention time.

AIBullisharXiv – CS AI · Mar 47/103

🧠

Self-Play Only Evolves When Self-Synthetic Pipeline Ensures Learnable Information Gain

Researchers propose a framework for sustainable AI self-evolution through triadic roles (Proposer, Solver, Verifier) that ensures learnable information gain across iterations. The study identifies three key system designs to prevent the common plateau effect in self-play AI systems: asymmetric co-evolution, capacity growth, and proactive information seeking.

AIBullisharXiv – CS AI · Mar 46/104

🧠

MASPOB: Bandit-Based Prompt Optimization for Multi-Agent Systems with Graph Neural Networks

Researchers introduce MASPOB, a bandit-based framework that optimizes prompts for Multi-Agent Systems using Graph Neural Networks to handle topology-induced coupling. The system reduces search complexity from exponential to linear while achieving state-of-the-art performance across benchmarks.

Page 1 of 3Next →