#autonomous-agents News & Analysis

247 articles tagged with #autonomous-agents. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

247 articles

AI × CryptoBullishCrypto Briefing · Jun 257/10

🤖

Sail Research raises $80M to build AI infrastructure for long-running agents

Sail Research has secured $80 million in funding to develop AI infrastructure for long-running autonomous agents. The platform aims to establish efficiency and performance standards that could pressure decentralized networks to compete on cost-effectiveness and operational capabilities.

AI × CryptoBullishNewsBTC · Jun 257/10

🤖

World Network Agentkit Links Verified Humans To Autonomous AI Agents

World Network has launched Agentkit, a platform that connects autonomous AI agents to verified human owners, establishing a verifiable identity layer for agentic commerce. This development addresses a critical gap in the emerging autonomous agent ecosystem by linking digital agents to accountable human principals.

AINeutralarXiv – CS AI · Jun 257/10

🧠

The Unfireable Safety Kernel: Execution-Time AI Alignment for AI Agents and Other Escapable AI Systems

Researchers present the Unfireable Safety Kernel, a formally verified execution-time control mechanism designed to prevent AI agents from circumventing safety constraints. The system uses process separation and cryptographic verification to enforce authorization decisions outside the agent's runtime, addressing vulnerabilities in current safety approaches that rely on internal controls.

AIBullisharXiv – CS AI · Jun 237/10

🧠

AIR: Improving Agent Safety through Incident Response

Researchers introduce AIR, the first incident response framework for LLM agent systems that detects, contains, and recovers from failures autonomously. The framework achieves over 90% success rates across detection, remediation, and eradication, addressing a critical gap in agent safety by shifting focus from prevention-only approaches to active incident management.

AI × CryptoNeutralarXiv – CS AI · Jun 237/10

🤖

Infrastructure for the Agentic Web: Gap Analysis and Architecture from the Agentverse Platform

Fetch.ai's Agentverse platform represents one of the most mature agent-native cloud infrastructures available, yet a comprehensive audit reveals 62 distinct missing capabilities across eight categories. The research proposes a seven-layer Agent Cloud Stack architecture and five critical evolution paths needed to support autonomous AI agents as first-class Web participants by 2030.

$FET

AIBullisharXiv – CS AI · Jun 237/10

🧠

Darwin Mobile Agent: A Roadmap for Self-Evolution

Researchers introduce Darwin Mobile Agent, an open-source infrastructure enabling autonomous reinforcement learning agents to interact with mobile GUIs at scale. The framework addresses data collection bottlenecks through parallel cloud-phone instances and proposes a roadmap to remove human priors from AI agent design, advancing toward truly self-evolving autonomous systems.

AIBullisharXiv – CS AI · Jun 237/10

🧠

SkillHarness: Harnessing Safe Skills for Computer-Use Agents

Researchers introduce SkillHarness, a framework enabling computer-use agents to safely learn and reuse skills in dynamic environments by constraining skill learning against adversarial attacks and environmental disruptions. The system reduces unsafe skill rates by 57.1% compared to existing approaches, addressing a critical vulnerability in AI agents deployed in interactive settings.

AIBullisharXiv – CS AI · Jun 237/10

🧠

ENVS: Environment-Native Verified Search for Long-Horizon GUI Agents

Researchers introduce ENVS (Environment-Native Verified Search), a novel training approach for GUI agents that discovers verified action trajectories in live desktop environments before policy optimization. The method achieves 30.3 pass@8 on OSWorld benchmarks while reducing computational requirements by 25-28% compared to existing reinforcement learning approaches, and demonstrates robust performance even under simulated desktop interruptions.

AIBullisharXiv – CS AI · Jun 237/10

🧠

AOHP: An Open-Source OS-Level Agent Harness for Personalized, Efficient and Secure Interaction

Researchers introduce AOHP, an open-source OS-level agent harness built on Android that treats AI agents as first-class operating system actors. The framework addresses architectural gaps in current systems by enabling personalized service composition, efficient agent interfaces, and secure information flow, demonstrating significant improvements in task completion rates, execution costs, and security compliance.

AI × CryptoNeutralcrypto.news · Jun 217/10

🤖

NEAR’s bet to become the settlement layer for AI agents

NEAR Protocol is positioning itself as the settlement layer for AI agents that operate at machine speed, with a June upgrade designed specifically to support autonomous agent transactions on-chain. The strategy reflects a broader bet that AI agents will become major blockchain users, though execution challenges remain.

$NEAR

AIBullishBlockonomi · Jun 197/10

🧠

Robinhood (HOOD) Stock Surges 42% as AI Trading Feature Attracts 50,000 Users

Robinhood's stock surged 42% following the launch of its AI trading feature, which has attracted 50,000 users conducting millions in daily trades through autonomous agentic accounts. The development signals growing mainstream adoption of AI-driven trading tools and demonstrates market demand for algorithmic portfolio management services.

AIBullisharXiv – CS AI · Jun 197/10

🧠

Before the Pull Request: Mining Multi-Agent Coordination

Researchers introduce grite, an open-source coordination substrate that enables autonomous coding agents to track shared work through git-based event logs, reducing duplicate efforts from 78% to 0% while tripling useful throughput. The system addresses a critical gap in multi-agent collaboration that traditional pull-request metrics cannot capture, revealing previously invisible failure modes like conflicting edits and lock starvation.

AIBullisharXiv – CS AI · Jun 197/10

🧠

Advancing DialNav through Automatic Embodied Dialog Augmentation

Researchers introduce RAINbow, a large-scale dataset of 238K episodes for DialNav, an embodied AI navigation system that requires dialog interaction. Through automatic dataset augmentation, dual-strategy training, and improved localization models, the team achieves significant performance improvements (89-100% gains), advancing the practical deployment of conversational embodied agents.

AIBullisharXiv – CS AI · Jun 197/10

🧠

VOiLA: Vectorized Online Planning with Learned Diffusion Model for POMDP Agents

Researchers introduce VOiLA, a framework that uses learned diffusion models to enable efficient online planning for robots operating under uncertainty in partially observable environments. By distilling diffusion samplers into compact neural networks and integrating with a GPU-parallelized planner, VOiLA reduces computational costs by up to 1000x while outperforming reinforcement learning baselines with 90% less training data.

AIBullisharXiv – CS AI · Jun 197/10

🧠

Connect the Dots: Training LLMs for Long-Lifecycle Agents with Cross-Domain Generalization Via Reinforcement Learning

Researchers present the 'Connect the Dots' (CoD) framework for training large language models to function as long-lifecycle agents that learn from experience and progressively improve performance across tasks. The work combines reinforcement learning with self-updating context mechanisms, demonstrating cross-domain generalization capabilities and releasing implementations to advance AI agent research.

AIBullisharXiv – CS AI · Jun 127/10

🧠

Arbor: Tree Search as a Cognition Layer for Autonomous Agents

Arbor introduces a multi-agent framework using tree search as a cognition layer for autonomous agents operating in complex action spaces. The system achieves 193% inference throughput-latency improvements over vendor baselines through coordinated Orchestrator and Critic agents, demonstrating reproducible, hardware-agnostic optimization across multiple hardware generations.

AIBearisharXiv – CS AI · Jun 127/10

🧠

The Containment Gap: How Deployed Agentic AI Frameworks Fail Public-Facing Safety Requirements

Researchers found that three major agentic AI frameworks (LangChain, AutoGPT, OpenAI Agents SDK) lack native safety guarantees required for public-facing deployments. A memory-poisoning attack demonstrated on a government benefits system increased wrongful denials to 88.9%, highlighting critical vulnerabilities in systems handling sensitive applications like healthcare and financial advising.

🏢 OpenAI

AINeutralMIT Technology Review · Jun 117/10

🧠

Google DeepMind is worried about what happens when millions of agents start to interact

Google DeepMind is investing in research to understand the risks of millions of AI agents interacting autonomously online without human oversight. The concern centers on scenarios where these agents follow instructions from other agents, potentially creating unpredictable emergent behaviors at scale.

🏢 Google

AI × CryptoBearishCrypto Briefing · Jun 117/10

🤖

Agents’ Last Exam reveals AI agents struggle with real work tasks, passing just 2.6% of the time

A recent study called 'Agents' Last Exam' reveals that AI agents successfully complete real-world work tasks only 2.6% of the time, exposing significant limitations in current AI model capabilities. This finding underscores the substantial gap between AI's theoretical potential and practical performance, necessitating major improvements in model architecture and training methodologies before widespread deployment in critical applications.

AI × CryptoBullishBlockonomi · Jun 107/10

🤖

XRP Ledger Makes Strategic Move Into AI Payments

RippleX launched the XRP Ledger AI Starter Kit on June 10, enabling autonomous payments for AI agents through XRP and RLUSD tokens. The integration of the X402 protocol allows AI systems to pay for APIs, model inference, and digital services directly on the blockchain, representing a strategic convergence of AI infrastructure and cryptocurrency payment rails.

$XRP

AI × CryptoBullishCrypto Briefing · Jun 107/10

🤖

Ripple debuts toolkit for autonomous AI transactions on XRP Ledger

Ripple has launched x402, a toolkit enabling autonomous AI transactions on the XRP Ledger. The development aims to streamline AI-driven payment processing and could significantly impact how digital commerce handles automated transactions at scale.

$XRP

AINeutralarXiv – CS AI · Jun 107/10

🧠

PreAct-Bench: Benchmarking Predictive Monitoring in LLMs

Researchers introduce PreAct-Bench, a benchmark for evaluating LLMs' ability to predict unethical behavior from partial action trajectories before harmful actions occur. The study reveals that predictive monitoring remains a significant challenge even for advanced models, highlighting a critical gap in proactive AI safety mechanisms.

AIBullisharXiv – CS AI · Jun 107/10

🧠

ActiveMem: Distributed Active Memory for Long-Horizon LLM Reasoning

Researchers introduce ActiveMem, a distributed memory framework that decouples storage from reasoning in large language models, enabling agents to handle longer tasks without context overload. The system separates executive planning from memory management—inspired by human brain architecture—and demonstrates state-of-the-art performance on complex reasoning benchmarks while reducing computational overhead.

AIBullisharXiv – CS AI · Jun 107/10

🧠

3SPO: State-Score-Supervised Policy Optimization for LLM Agents

Researchers introduce 3SPO (State-Score-Supervised Policy Optimization), a reinforcement learning algorithm that optimizes LLM agent policies at each step rather than after complete episodes, addressing credit assignment challenges in sparse-reward environments. Experiments demonstrate 22.6% improvement over existing methods on ALFWorld benchmarks with 2.4x more state exploration and 1.8x faster convergence.

AIBullisharXiv – CS AI · Jun 107/10

🧠

A History-Aware Visually Grounded Critic for Computer Use Agents

Researchers introduce HiViG, a test-time framework that enhances Computer Use Agents through history-aware and visually grounded critic models. The system improves GUI task performance by 5.8-9.0% across web, mobile, and desktop platforms by maintaining action history and verifying execution coordinates against visual interfaces.

🧠 Gemini

Page 1 of 10Next →