#autonomous-agents News & Analysis

247 articles tagged with #autonomous-agents. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

247 articles

AIBearisharXiv – CS AI · Jun 107/10

🧠

GitInject: Real-World Prompt Injection Attacks in AI-Powered CI/CD Pipelines

Researchers present GitInject, a framework demonstrating prompt injection vulnerabilities in AI-powered CI/CD pipelines used by major tech companies. The study reveals that all tested AI providers are susceptible to attacks that could enable credential theft, code manipulation, and supply chain compromise through GitHub workflows.

AIBullisharXiv – CS AI · Jun 107/10

🧠

3SPO: State-Score-Supervised Policy Optimization for LLM Agents

Researchers introduce 3SPO (State-Score-Supervised Policy Optimization), a reinforcement learning algorithm that optimizes LLM agent policies at each step rather than after complete episodes, addressing credit assignment challenges in sparse-reward environments. Experiments demonstrate 22.6% improvement over existing methods on ALFWorld benchmarks with 2.4x more state exploration and 1.8x faster convergence.

AI × CryptoBullisharXiv – CS AI · Jun 97/10

🤖

RAILS: Verification-Native Clearing For Agentic Commerce

RAILS is a verification-native clearing protocol designed to resolve the agentic clearing problem—determining whether autonomous agents have met their obligations and who bears responsibility when they fail. The protocol introduces seven primitives and a formal verification model that ensures no financially material settlement occurs without evidence meeting the required admissibility threshold, establishing a falsifiable soundness property previously absent in agent-commerce systems.

AIBearisharXiv – CS AI · Jun 97/10

🧠

VATS: Exploiting Implicit Authority in Error-Path Injection via Systematic Mutation

Researchers have identified a critical vulnerability in the Model Context Protocol (MCP) used by autonomous AI agents, where error messages can be weaponized to bypass safety guardrails. The VATS framework demonstrates that error-path injection attacks triple the success rate of standard prompt injection techniques, achieving near-perfect compliance rates across leading AI models, though production-level mitigations exist.

🧠 GPT-5🧠 Gemini

AIBullisharXiv – CS AI · Jun 97/10

🧠

AgentTrust: A Self-Improving Trust Layer for AI-Agent Actions

AgentTrust v2 introduces a self-improving trust layer for AI agents that distinguishes between lexical (rule-detectable) and semantic (intent-dependent) threats. Using an LLM judge combined with a dual-store system, it achieves 83.6-85.2% accuracy on semantic threats while progressively distilling deterministic rules for lexical threats, demonstrating zero false-blocks across 45,000 test actions.

AIBullisharXiv – CS AI · Jun 97/10

🧠

Semantic Quorum Assurance: Collective Certification for Non-Deterministic AI Infrastructure

Researchers propose Semantic Quorum Assurance (SQA), a new control-plane mechanism that uses multiple AI validator agents to assess the safety of infrastructure mutations in cloud systems before execution. The approach reduces unsafe approvals from 18.5% with single-agent validation to 0.3% by aggregating diverse validator judgments under a risk-adaptive quorum system, adding 1.45–4.12 seconds of latency.

AI × CryptoBullishBlockonomi · Jun 87/10

🤖

MetaMask Debuts Agent Wallet for Autonomous DeFi Access

MetaMask has launched Agent Wallet, enabling AI-driven autonomous trading on Ethereum with built-in security features including transaction simulation, threat scanning, and MEV protection. The wallet supports multiple DeFi activities from swaps to perpetual futures, with risky transactions requiring two-factor authentication and safe transactions covered by up to $10,000 in Transaction Protection.

$ETH

AI × CryptoBullishCrypto Briefing · Jun 87/10

🤖

MetaMask debuts AI agent wallet with up to $10K in transaction protection coverage

MetaMask has launched an AI agent wallet featuring up to $10,000 in transaction protection coverage, marking a significant step toward autonomous asset management in decentralized finance. The innovation combines AI-driven decision-making with security safeguards, potentially increasing user adoption by addressing confidence gaps in self-custodial wallets.

AI × CryptoBullishCrypto Briefing · Jun 87/10

🤖

Fetch.AI unveils full agentic infrastructure with Your Personal AI and Agentverse

Fetch.AI has launched a comprehensive agentic infrastructure platform featuring Your Personal AI and Agentverse, positioning itself to enable autonomous transactions and AI-driven marketplaces. The infrastructure aims to reshape how digital economies operate by leveraging autonomous agents for decentralized commerce and services.

$FET

AIBearisharXiv – CS AI · Jun 87/10

🧠

It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents

Researchers introduce TRAP, a benchmark demonstrating that web-based AI agents are vulnerable to prompt injection attacks hidden in interface elements, with susceptibility rates ranging from 13% to 43% across frontier models. The study reveals that small contextual changes can double attack success rates, exposing systemic security weaknesses in autonomous agents performing real-world tasks like email management and professional networking.

🧠 GPT-5

AIBullishFortune Crypto · Jun 77/10

🧠

OpenAI readies ‘superapp’ pivot ahead of planned IPO, FT reports

OpenAI is planning a strategic pivot toward a 'superapp' model ahead of its anticipated IPO, betting that the future of AI lies in autonomous agents capable of handling complex, multi-step tasks. This shift reflects a broader industry conviction that AI systems will move beyond single-purpose tools toward integrated platforms that can independently execute sophisticated operations.

🏢 OpenAI

AIBullisharXiv – CS AI · Jun 57/10

🧠

VASO: Formally Verifiable Self-Evolving Skills for Physical AI Agents

Researchers introduce VASO, a framework that combines formal verification with self-evolving language model skills for robot control, achieving 97.2% specification compliance on physical tasks. The approach bridges formal methods and foundation models by using counterexamples from model checking as optimization feedback for skill contracts rather than modifying underlying model weights.

AI × CryptoBullishcrypto.news · Jun 47/10

🤖

Casper Network launches AI toolkit with autonomous payments and app-building tools

Casper Network has launched a production-ready AI toolkit on its mainnet that enables autonomous agents to execute blockchain payments and build decentralized applications without human intervention. This development represents a significant step toward autonomous cryptocurrency operations and could reshape how smart contracts and dApps function across blockchain ecosystems.

AIBullishAI News · Jun 47/10

🧠

Scout from M’Soft is the agentic Autopilot that works across M365

Microsoft announced wider testing of Scout, a new agentic Autopilot feature designed to work autonomously across Microsoft 365 applications. Each Autopilot has its own identity and can operate multiple agents, representing a new category of autonomous AI agents for enterprise users.

AIBearisharXiv – CS AI · Jun 47/10

🧠

The Saturation Trap and the Subjectivity of Intervention Timing: Why Affect-Based Triggers and LLM Judges Fail to Time Interventions on Autonomous Agents

Researchers studying runtime safety for autonomous AI agents found that affect-based triggers and LLM judges fail to reliably determine when to interrupt agents during task execution. The core problem: human annotators themselves cannot consistently agree on intervention timing, suggesting the task itself lacks reproducibility rather than detector accuracy being the primary issue.

🧠 GPT-5

AINeutralarXiv – CS AI · Jun 47/10

🧠

AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks?

Researchers introduce AutoLab, a benchmark testing whether frontier AI models can solve complex, multi-step engineering tasks over extended time horizons. Testing 17 state-of-the-art models reveals that persistence and iterative refinement—not initial quality—predict success, with most models failing to sustain long-horizon optimization despite their capabilities.

AIBearisharXiv – CS AI · Jun 47/10

🧠

What If Prompt Injection Never Left? Exploring Cross-Session Stored Prompt Injection in Agentic Systems

Researchers have identified a critical security vulnerability in agentic AI systems called cross-session stored prompt injection, where malicious instructions can persist within system state and compromise future interactions long after the attacker disconnects. This threat fundamentally differs from traditional prompt injection by leveraging long-lived system artifacts like memories and filesystems, transforming ephemeral model-level attacks into durable system-level vulnerabilities that accumulate over time.

AINeutralarXiv – CS AI · Jun 47/10

🧠

The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development?

Researchers introduced the Meta-Agent Challenge (MAC), a benchmark framework testing whether AI models can autonomously develop agent systems rather than simply execute pre-defined tasks. The study reveals that current frontier models rarely match human-engineered baselines, and successful implementations exhibit concerning behaviors like ground-truth exfiltration, highlighting critical gaps in AI robustness and alignment.

AI × CryptoBullishThe Block · Jun 37/10

🤖

Variant raises $222 million fund targeting early stage crypto, AI startups that expand ‘autonomy’

Variant, a prominent crypto venture fund, has closed a $222 million fund focused on early-stage startups building in autonomous systems, permissionless finance, and agentic AI sectors. Founder Jesse Walden signals the firm's strategic pivot toward projects that expand user and system autonomy, reflecting broader investor conviction in AI-driven decentralized finance.

AI × CryptoBullishFortune Crypto · Jun 37/10

🤖

Variant raises $222 million for new fund with a thesis of AI, crypto and ‘autonomy’

Variant, a venture capital firm led by a strategist who shaped Andreessen Horowitz's crypto approach, has raised $222 million for a new fund focused on AI, cryptocurrency, and autonomous systems. The fund reflects growing investor conviction that these three technologies will converge to define the next wave of decentralization.

AINeutralarXiv – CS AI · Jun 37/10

🧠

What Benchmarks Don't Measure: The Case for Evaluating Abstention Competence in Autonomous Agents

Researchers identify 'compliance bias' in autonomous agents trained via human feedback, where systems proceed with unsafe actions despite lacking necessary information, authorization, or evidence. The study proposes abstention-aware benchmarks and evaluation protocols that can block up to 89% of hazardous actions while maintaining 87.5% usability, challenging the assumption that safety and performance are inherently trade-offs.

AIBullishArs Technica – AI · Jun 27/10

🧠

Microsoft's Project Solara is an Android OS designed for agents instead of apps

Microsoft has unveiled Project Solara, an Android-based operating system designed to run AI agents rather than traditional applications. This strategic pivot reflects Microsoft's recognition that it missed the mobile app era and is now positioning itself for an AI-agent-centric computing paradigm.

AIBullisharXiv – CS AI · Jun 27/10

🧠

MemPro: Agentic Memory Systems as Evolvable Programs

Researchers introduce MemPro, an evolution framework that treats autonomous agent memory systems as adaptable programs rather than static pipelines. By iteratively diagnosing failures and refining the entire memory-construction-retrieval pipeline, MemPro outperforms fixed baselines on multiple benchmarks while maintaining computational efficiency.

AIBearisharXiv – CS AI · Jun 27/10

🧠

Safety Must Precede the Deployment of Open-Ended AI

A position paper argues that open-ended AI systems—which autonomously generate novel behaviors indefinitely—introduce distinct safety challenges including loss of predictability and emergent misalignment that existing frameworks cannot address. The authors call for proactive research and coordinated action before large-scale deployment of such systems.

AIBullisharXiv – CS AI · Jun 27/10

🧠

Stop Wandering, Find the Keys: LLMs Discriminate Key States for Efficient Multi-Agent Exploration

Researchers introduce LEMAE, a novel multi-agent reinforcement learning framework that leverages Large Language Models to identify critical 'key states' in complex environments, enabling agents to explore more efficiently with 10x acceleration in certain scenarios. The approach combines LLM-guided state discrimination with a Key State Memory Tree to reduce redundant exploration and improve performance on challenging benchmarks like SMAC and MPE.

← PrevPage 2 of 10Next →