#ai-systems News & Analysis

32 articles tagged with #ai-systems. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

32 articles

AIBullisharXiv – CS AI · 9h ago7/10

🧠

GSAM: A Generalizable and Safe Robotic Framework for Articulated Object Manipulation

GSAM is a new robotic framework that improves articulated object manipulation through vision-based perception, VLM-based refinement with commonsense reasoning, and constraint-based planning to prevent collisions. In experiments across 50 hinge tasks, GSAM achieved 36% higher success rates and 3.1% lower standard deviation compared to existing baselines, demonstrating superior generalization and safety.

AIBullisharXiv – CS AI · 3d ago7/10

🧠

VikingMem: A Memory Base Management System for Stateful LLM-based Applications

Researchers introduce VikingMem, a memory management system for long-term LLM interactions that addresses context window limitations through selective memory extraction, stateful evolution, and temporal weighting. The system demonstrates 30% improvements in memory retrieval effectiveness while maintaining low latency, offering a generalizable solution across diverse applications beyond traditional chatbots.

AIBullisharXiv – CS AI · 3d ago7/10

🧠

Keep the Proof State Live: Snapshotting for Efficient Tactic Search in Lean 4

Researchers introduce proof-state snapshotting, a technique that accelerates automated theorem proving in Lean 4 by reusing elaborated proof states across parallel search branches instead of reconstructing them. The method achieves 5.6-50x speedups (averaging 14x) on benchmark problems, addressing a critical bottleneck where per-branch overhead from import loading and elaboration consumed over 99% of computation time.

AIBullisharXiv – CS AI · 5d ago7/10

🧠

MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation

Researchers propose MUSE-Autoskill, a framework enabling LLM agents to autonomously create, store, and refine reusable skills throughout their operational lifecycle. The system treats skills as long-lived, testable assets with integrated memory and evaluation mechanisms, demonstrating improved task success rates and cross-agent knowledge transfer on benchmark tests.

AIBullisharXiv – CS AI · May 127/10

🧠

SkillEvolver: Skill Learning as a Meta-Skill

SkillEvolver introduces a meta-learning framework that automatically improves AI agent skills through iterative refinement based on real-world deployment failures, achieving 56.8% accuracy on benchmark tasks compared to 43.6% for manually curated skills. The system learns by modifying skill prose and code rather than model weights, enabling seamless integration with any compatible agent without retraining.

AINeutralarXiv – CS AI · May 97/10

🧠

Authorization Propagation in Multi-Agent AI Systems: Identity Governance as Infrastructure

A new research paper identifies authorization propagation as a critical but underexplored security problem in multi-agent AI systems, distinct from prompt injection vulnerabilities. The paper argues that identity governance must become foundational infrastructure in AI orchestration, with seven structural requirements for maintaining authorization invariants across distributed agent interactions.

AIBullisharXiv – CS AI · Apr 207/10

🧠

Bilevel Optimization of Agent Skills via Monte Carlo Tree Search

Researchers propose a bilevel optimization framework using Monte Carlo Tree Search to systematically improve LLM agent skills—structured collections of instructions, tools, and resources. The framework optimizes both skill structure and component content simultaneously, demonstrating performance improvements on Operations Research tasks and addressing a previously unsolved challenge in agent design optimization.

AINeutralarXiv – CS AI · Apr 147/10

🧠

From GPT-3 to GPT-5: Mapping their capabilities, scope, limitations, and consequences

A comprehensive comparative study traces the evolution of OpenAI's GPT models from GPT-3 through GPT-5, revealing that successive generations represent far more than incremental capability improvements. The research demonstrates a fundamental shift from simple text predictors to integrated, multimodal systems with tool access and workflow capabilities, while persistent limitations like hallucination and benchmark fragility remain largely unresolved across all versions.

🧠 GPT-4🧠 GPT-5

AIBullisharXiv – CS AI · Apr 67/10

🧠

Glia: A Human-Inspired AI for Automated Systems Design and Optimization

Researchers have developed Glia, an AI architecture using large language models in a multi-agent workflow to autonomously design computer systems mechanisms. The system generates interpretable designs for distributed GPU clusters that match human expert performance while providing novel insights into workload behavior.

AIBullisharXiv – CS AI · Mar 177/10

🧠

StatePlane: A Cognitive State Plane for Long-Horizon AI Systems Under Bounded Context

Researchers introduce StatePlane, a model-agnostic cognitive state management system that enables AI systems to maintain coherent reasoning over long interaction horizons without expanding context windows or retraining models. The system uses episodic, semantic, and procedural memory mechanisms inspired by cognitive psychology to overcome current limitations in large language models.

AINeutralarXiv – CS AI · Mar 127/10

🧠

Defining AI Models and AI Systems: A Framework to Resolve the Boundary Problem

A comprehensive study analyzing 896 academic papers and 80+ regulatory documents reveals critical ambiguities in how 'AI models' and 'AI systems' are defined across regulations like the EU AI Act. The research proposes clear operational definitions to resolve regulatory boundary problems that complicate responsibility allocation across the AI value chain.

AIBullisharXiv – CS AI · Mar 127/10

🧠

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

Researchers developed KernelSkill, a multi-agent framework that optimizes GPU kernel performance using expert knowledge rather than trial-and-error approaches. The system achieved 100% success rates and significant speedups (1.92x to 5.44x) over existing methods, addressing a critical bottleneck in AI system efficiency.

AIBullishOpenAI News · Mar 97/10

🧠

OpenAI to acquire Promptfoo

OpenAI is acquiring Promptfoo, an AI security platform that specializes in helping enterprises identify and fix vulnerabilities in AI systems during the development process. This acquisition strengthens OpenAI's security capabilities and enterprise offerings.

🏢 OpenAI

AIBullisharXiv – CS AI · Mar 67/10

🧠

AMV-L: Lifecycle-Managed Agent Memory for Tail-Latency Control in Long-Running LLM Systems

Researchers introduce AMV-L, a new memory management framework for long-running LLM systems that uses utility-based lifecycle management instead of traditional time-based retention. The system improves throughput by 3.1x and reduces latency by up to 4.7x while maintaining retrieval quality by controlling memory working-set size rather than just retention time.

AIBullisharXiv – CS AI · Mar 47/103

🧠

Self-Play Only Evolves When Self-Synthetic Pipeline Ensures Learnable Information Gain

Researchers propose a framework for sustainable AI self-evolution through triadic roles (Proposer, Solver, Verifier) that ensures learnable information gain across iterations. The study identifies three key system designs to prevent the common plateau effect in self-play AI systems: asymmetric co-evolution, capacity growth, and proactive information seeking.

AIBullisharXiv – CS AI · Mar 46/104

🧠

MASPOB: Bandit-Based Prompt Optimization for Multi-Agent Systems with Graph Neural Networks

Researchers introduce MASPOB, a bandit-based framework that optimizes prompts for Multi-Agent Systems using Graph Neural Networks to handle topology-induced coupling. The system reduces search complexity from exponential to linear while achieving state-of-the-art performance across benchmarks.

AINeutralarXiv – CS AI · 5d ago6/10

🧠

TADDLE: A Tool-Augmented Agent for Detecting Deficient LLM-Generated Peer Reviews

Researchers introduce TADDLE, an AI system that detects quality deficiencies in LLM-generated peer reviews by decomposing analysis into specialized tools and multi-label classification. The work addresses a growing problem in academic publishing where AI-written reviews are fluent but potentially flawed, backed by the first expert-annotated benchmark of 1,800 reviews across six defect categories.

AINeutralarXiv – CS AI · May 116/10

🧠

MEMOREPAIR: Barrier-First Cascade Repair in Agentic Memory

Researchers introduce MemoRepair, a system that addresses cascade failures in agentic memory by preventing stale or invalidated information from corrupting downstream AI agent decisions. Using a barrier-first approach and graph-based optimization, the system reduces invalid memory exposure from 69-94% to 0% while maintaining 91-94% of valid successor states with significantly lower repair costs.

AINeutralAI News · Apr 146/10

🧠

Hyundai expands into robotics and physical AI systems

Hyundai Motor Group is pivoting toward physical AI systems, integrating artificial intelligence into robots and machinery designed to operate in real-world environments. The company's current focus centers on factory and industrial applications, signaling a major shift in how the automotive giant approaches automation and manufacturing technology.

AINeutralarXiv – CS AI · Apr 146/10

🧠

Do We Still Need GraphRAG? Benchmarking RAG and GraphRAG for Agentic Search Systems

A new benchmark study (RAGSearch) evaluates whether agentic search systems can reduce the need for expensive GraphRAG pipelines by dynamically retrieving information across multiple rounds. Results show agentic search significantly improves standard RAG performance and narrows the gap to GraphRAG, though GraphRAG retains advantages for complex multi-hop reasoning tasks when preprocessing costs are considered.

🏢 Meta

AIBullisharXiv – CS AI · Mar 276/10

🧠

Supervising Ralph Wiggum: Exploring a Metacognitive Co-Regulation Agentic AI Loop for Engineering Design

Researchers developed a novel Co-Regulation Design Agentic Loop (CRDAL) system that uses metacognitive agents to improve AI-driven engineering design by reducing design fixation. The system showed better performance than traditional approaches in battery pack design tasks without significantly increasing computational costs.

AINeutralarXiv – CS AI · Mar 166/10

🧠

Literary Narrative as Moral Probe : A Cross-System Framework for Evaluating AI Ethical Reasoning and Refusal Behavior

Researchers developed a new method to evaluate AI ethical reasoning using literary narratives from science fiction, testing 13 AI systems across 24 conditions. The study found that current AI systems perform surface-level ethical responses rather than genuine moral reasoning, with more sophisticated systems showing more complex failure modes.

🏢 Anthropic🏢 Microsoft🧠 Claude

AIBullishMarkTechPost · Mar 116/10

🧠

How to Build a Self-Designing Meta-Agent That Automatically Constructs, Instantiates, and Refines Task-Specific AI Agents

This tutorial demonstrates building a Meta-Agent system that automatically designs and instantiates task-specific AI agents from simple descriptions. The system dynamically analyzes tasks, selects appropriate tools, configures memory architecture and planners, then creates fully functional agent runtimes without relying on static templates.

AIBullisharXiv – CS AI · Mar 37/109

🧠

NeuroHex: Highly-Efficient Hex Coordinate System for Creating World Models to Enable Adaptive AI

NeuroHex introduces a hexagonal coordinate system inspired by human brain grid cells to create highly efficient world models for adaptive AI systems. The framework achieves 90-99% reduction in geometric complexity while processing real-world map data, offering significant improvements for autonomous AI spatial reasoning and navigation.

AIBullisharXiv – CS AI · Mar 36/1010

🧠

From Goals to Aspects, Revisited: An NFR Pattern Language for Agentic AI Systems

Researchers have developed a pattern language methodology to systematically identify and modularize crosscutting concerns in agentic AI systems, addressing issues like security, reliability, and cost management that contribute to high AI project failure rates. The approach uses goal models to discover reusable patterns and implements them through aspect-oriented programming in Rust.

Page 1 of 2Next →