#agentic-systems News & Analysis

64 articles tagged with #agentic-systems. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

64 articles

AIBullisharXiv – CS AI · Jun 257/10

🧠

Autodata: An agentic data scientist to create high quality synthetic data

Autodata introduces an AI-powered method where agents act as data scientists to autonomously generate high-quality synthetic training and evaluation data. The approach, implemented through Agentic Self-Instruct, demonstrates improved performance over traditional synthetic data creation methods across computer science, legal reasoning, and mathematical reasoning tasks, with further gains achieved through meta-optimization of the data scientist agent itself.

AIBullishAI News · Jun 197/10

🧠

SAP and Google Cloud deploy agentic commerce architecture

SAP and Google Cloud have launched an agentic commerce architecture designed to automate multi-agent marketing and retail operations at enterprise scale. The partnership addresses a critical gap where 78% of businesses view AI as essential for customer retention by 2026, yet fewer than 40% of companies effectively share customer data across CRM and customer experience platforms.

AI × CryptoBullisharXiv – CS AI · Jun 197/10

🤖

DeXposure-Claw: An Agentic System for DeFi Risk Supervision

Researchers introduce DeXposure-Claw, an AI-powered supervision system designed to monitor DeFi credit risks by combining graph time-series forecasting with structured evidence gates to reduce false alarms in regulatory decision-making. The system includes a new evaluation benchmark aligned with regulatory standards, validated on five years of real blockchain data.

AIBullisharXiv – CS AI · Jun 97/10

🧠

End-to-End Context Compression at Scale

Researchers introduce Latent Context Language Models (LCLMs), a new encoder-decoder compression approach that addresses memory bottlenecks in long-context language model inference. By compressing KV caches at ratios of 1:4 to 1:16 while maintaining model quality, LCLMs enable faster processing of extended contexts and support adaptive expansion for long-horizon agent applications.

AIBullisharXiv – CS AI · Jun 87/10

🧠

DuMate-DeepResearch: An Auditable Multi-Agent System with Recursive Search and Rubric-Grounded Reasoning

Researchers have introduced DuMate-DeepResearch, a multi-agent AI system designed to handle complex research tasks with improved auditability and reasoning. The framework achieves state-of-the-art results on deep research benchmarks by combining dynamic planning, recursive task delegation, and rubric-based quality optimization.

AINeutralarXiv – CS AI · Jun 57/10

🧠

The End of Software Engineering: How AI Agents Are Fundamentally Restructuring the Software Paradigm

A research paper argues that AI agents powered by large language models represent a fundamental paradigm shift in software development, moving beyond traditional static code toward dynamic, self-modifying systems. The analysis traces this evolution through licensing, SaaS, and proposes Agent-as-a-Service (AaaS) as the next frontier, supported by recent benchmarks demonstrating both transformative potential and current limitations.

AIBullisharXiv – CS AI · Jun 57/10

🧠

Reducing Hallucinations in Complex Question Answering using Simple Graph-based Retrieval-Augmented Generation (long version)

Researchers present a graph-based retrieval-augmented generation (RAG) system that reduces AI hallucinations by integrating lightweight graph structures with vector search tools. Testing on Wikipedia QA benchmarks shows the approach halves hallucinated answers while improving factual precision and recall with minimal token overhead.

AIBullisharXiv – CS AI · Jun 57/10

🧠

Unsupervised Skill Discovery for Agentic Data Analysis

Researchers introduce DataCOPE, an unsupervised framework that enables AI agents to discover and refine data-analysis skills without labeled training data. By using verification signals from exploration trajectories, the system improves agent performance by 9.71% on report-style tasks and 32.30% on reasoning-style tasks, offering a practical approach to enhance analytical AI without costly manual supervision.

AIBullisharXiv – CS AI · Jun 57/10

🧠

Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding

Researchers introduce Active Video Perception (AVP), an AI framework that enables agents to actively seek relevant evidence in long videos rather than passively processing entire content. The system uses an iterative plan-observe-reflect process to achieve superior accuracy on five benchmarks while reducing inference time by 82% and token usage by 88% compared to existing agentic methods.

AINeutralarXiv – CS AI · Jun 47/10

🧠

R-APS: Compositional Reasoning and In-Context Meta-Learning for Constrained Design via Reflective Adversarial Pareto Search

Researchers introduce R-APS (Reflective Adversarial Pareto Search), a novel method that enhances large language model reasoning for constrained design tasks by decomposing reasoning modes into separate contexts and orchestrating them across multiple timescales. The approach delivers 3.5x tighter robustness guarantees and 46% faster convergence on mechanical design problems without requiring model fine-tuning.

AINeutralarXiv – CS AI · Jun 27/10

🧠

The Deterministic Horizon: When Extended Reasoning Fails and Tool Delegation Becomes Necessary

Researchers establish fundamental information-theoretic limits on decoder-only transformer attention for state-tracking tasks, proving extended reasoning degrades performance beyond a 'Deterministic Horizon' of 19-31 steps. Tool delegation consistently outperforms neural chain-of-thought across 12 models (86-94% vs 24-42% accuracy), suggesting hybrid agentic systems require external tools rather than pure neural reasoning for complex deterministic tasks.

AIBullisharXiv – CS AI · Jun 27/10

🧠

Self-Healing Agentic Orchestrators for Reliable Tool-Augmented Large Language Model Systems

Researchers present a self-healing orchestration framework for tool-augmented large language models that treats reliability as a bounded runtime control problem, achieving 98.8% task success by mapping failure signals to recovery actions and verifying results. The approach outperforms retry-only and full-replanning baselines across multiple benchmarks, particularly excelling when recovery budgets are constrained.

AIBullisharXiv – CS AI · Jun 27/10

🧠

ToolSelf: Unifying Task Execution and Self-Reconfiguration via Tool-Driven Emergent Adaptation

ToolSelf introduces a runtime self-reconfiguration paradigm for LLM-powered agents that dynamically adapts task execution strategies during operation rather than relying on static pre-execution configurations. The approach unifies configuration updates with task execution through a standardized tool interface, achieving 28.8-point performance gains over static baselines after Configuration-Aware Two-stage Training.

AINeutralarXiv – CS AI · Jun 27/10

🧠

Monitoring Agentic Systems Before They're Reliable

Researchers present a monitoring methodology for agentic AI systems still in early production stages, where structural integration defects rather than task-level errors cause most failures. The approach uses variance-based characterization across three monitoring scopes to identify and triage issues, finding that task-level error detection is often masked by underlying system architecture problems.

AIBearisharXiv – CS AI · Jun 17/10

🧠

Investigating Detection and Obfuscation of Prompt Injection Attacks Against Software Reverse Engineering AI Agents

Researchers have demonstrated that agentic AI systems used for software reverse engineering are vulnerable to prompt injection attacks embedded in executable binaries, and have developed both offensive obfuscation techniques and defensive detection methods. This research highlights critical security gaps in AI-powered code analysis tools that organizations are beginning to deploy in production environments.

AIBullisharXiv – CS AI · May 297/10

🧠

Hallucination Mitigation with Agentic AI, Nested Learning, and AI Sustainability via Semantic Caching

Researchers present a multi-agent LLM pipeline architecture that reduces hallucinations by 31-36% through nested learning, semantic caching, and progressive review stages. The system simultaneously improves factual reliability, cuts energy consumption by 47%, and enhances auditability without requiring model retraining.

AINeutralarXiv – CS AI · May 287/10

🧠

Intelligence as Managed Autonomy: Failure, Escalation, and Governance for Agentic AI Systems

Researchers propose the SMARt framework, a four-layer autonomous AI system architecture that manages failures through formal escalation protocols rather than relying solely on model improvements. The framework enables AI agents to detect uncertainty, suspend operations, attempt recovery, and surrender control when reliability diminishes, addressing the fundamental architectural vulnerability of unbounded autonomy in deployed agentic systems.

AIBullisharXiv – CS AI · May 287/10

🧠

Prompt Codebooks: Discrete Compositional Optimization for Language Model Instruction Refinement

Researchers introduce Prompt Codebooks (PCO), a new framework for automatic prompt optimization that breaks down instructions into reusable, atomic components rather than treating prompts as fixed strings. The method achieves up to 30% performance gains over baseline approaches while reducing prompt lengths by 14x, enabling more efficient and adaptive language model instruction refinement.

AINeutralarXiv – CS AI · May 277/10

🧠

Beyond Final Answers: Auditing Trajectory-Level Hallucinations in Multi-Agent Industrial Workflows

Researchers introduce Trajel, a dataset and evaluation framework for detecting hallucinations in multi-step LLM agent workflows, revealing that existing benchmarks miss intermediate failures. The framework defines five hallucination types and shows that trajectory-level detection outperforms traditional post-hoc verification, highlighting critical gaps in current AI safety evaluation methodologies.

AIBullisharXiv – CS AI · May 277/10

🧠

The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence

MiniMax introduces the M2 series, a Mixture-of-Experts language model with 229.9B total parameters but only 9.8B activated per token, achieving frontier-tier performance on agentic tasks through agent-driven data pipelines and a custom reinforcement learning system called Forge. The M2.7 checkpoint demonstrates early self-evolution capabilities, autonomously debugging and modifying its own training scaffold.

AIBearisharXiv – CS AI · May 127/10

🧠

When Child Inherits: Modeling and Exploiting Subagent Spawn in Multi-Agent Networks

Researchers have identified critical security vulnerabilities in multi-agent AI networks where compromised parent agents can propagate malicious instructions to spawned subagents through inherited memory. The study demonstrates how current LLM frameworks violate trust boundaries via insecure memory inheritance and weak resource controls, turning localized agent compromises into systemic network risks.

🧠 ChatGPT

AIBullisharXiv – CS AI · May 127/10

🧠

RewardHarness: Self-Evolving Agentic Post-Training

RewardHarness introduces a self-evolving agentic framework that dramatically improves reward modeling for image-editing evaluation using only 0.05% of typical training data. By iteratively refining tools and skills from minimal examples rather than large-scale annotations, the system achieves 47.4% accuracy on benchmarks, outperforming GPT-5 and enabling more efficient AI alignment.

🧠 GPT-5

AIBullisharXiv – CS AI · May 127/10

🧠

AHD Agent: Agentic Reinforcement Learning for Automatic Heuristic Design

Researchers introduce AHD Agent, a reinforcement learning framework that enables language models to autonomously design heuristics for solving complex combinatorial optimization problems. A 4-billion-parameter model achieves performance comparable to much larger systems while requiring significantly fewer computational evaluations, advancing the frontier of AI-driven algorithm design.

AIBullisharXiv – CS AI · May 117/10

🧠

Multi-Modal Multi-Agent Reinforcement Learning for Radiology Report Generation

Researchers introduce MARL-Rad, a multi-agent reinforcement learning framework that optimizes AI agents specifically for radiology report generation rather than using fixed LLMs in pre-designed workflows. The system decomposes chest X-ray interpretation into specialized regional agents coordinated by a global integrator, achieving state-of-the-art clinical performance on benchmark datasets with clinician validation.

AIBullisharXiv – CS AI · May 97/10

🧠

Securing the Agent: Vendor-Neutral, Multitenant Enterprise Retrieval and Tool Use

Researchers present a layered security architecture for multitenant enterprise AI systems that isolates data and controls access in retrieval-augmented generation (RAG) and agentic AI deployments. The approach separates security-critical operations to the server while preventing cross-tenant data leakage, validated through an open-source OGX framework with negligible performance overhead.

🏢 OpenAI

Page 1 of 3Next →