#ai-agents News & Analysis

Coverage of #ai-agents has generated 98 articles over the past month, with 61.2% maintaining a bullish sentiment. Discussion remains stable compared to the previous quarter, reflecting consistent interest rather than sudden shifts in outlook. The conversation centers on major AI models including GPT-5 and Claude, with substantial research contributions tracked through arXiv's computer science and AI channels alongside cryptocurrency-focused outlets. The topic frequently intersects with machine learning, large language models, and automation research, while also appearing alongside discussions of blockchain assets like Ethereum and Bitcoin. Scan the articles below to explore how #ai-agents are being developed, deployed, and analyzed across technical and financial perspectives.

sentiment · last 30d (98 articles)

Top sources:arXiv – CS AI · 243Crypto Briefing · 19CoinDesk · 18Fortune Crypto · 12TechCrunch – AI · 12

Often co-tagged with:#machine-learning #llm #research #automation #enterprise-ai #open-source

Most-discussed entities:GPT-5 · 13Claude · 13Anthropic · 10OpenAI · 9Opus · 6

902 articles

AIBullishThe Verge – AI · Mar 57/10

🧠

OpenAI’s new GPT-5.4 model is a big step toward autonomous agents

OpenAI has launched GPT-5.4, a new AI model with native computer use capabilities that can operate computers and complete tasks across different applications. The model represents a significant step toward autonomous AI agents that can work in the background to complete complex jobs, combining improvements in reasoning, coding, and professional work.

🏢 OpenAI🧠 GPT-5🧠 ChatGPT

AIBearishMIT Technology Review · Mar 56/10

🧠

The Download: an AI agent’s hit piece, and preventing lightning

The article discusses how online harassment is evolving with AI technology, specifically mentioning an incident where Scott Shambaugh denied an AI agent's request to contribute to matplotlib software library. The piece appears to be part of a technology newsletter covering AI-related developments and their societal implications.

AIBullisharXiv – CS AI · Mar 57/10

🧠

AgentSelect: Benchmark for Narrative Query-to-Agent Recommendation

Researchers introduce AgentSelect, a comprehensive benchmark for recommending AI agent configurations based on narrative queries. The benchmark aggregates over 111,000 queries and 107,000 deployable agents from 40+ sources to address the critical gap in selecting optimal LLM agent setups for specific tasks.

AIBullisharXiv – CS AI · Mar 57/10

🧠

Mozi: Governed Autonomy for Drug Discovery LLM Agents

Researchers have introduced Mozi, a dual-layer architecture designed to make AI agents more reliable for drug discovery by implementing governance controls and structured workflows. The system addresses critical issues of unconstrained tool use and poor long-term reliability that have limited LLM deployment in pharmaceutical research.

AIBearisharXiv – CS AI · Mar 56/10

🧠

$\tau$-Knowledge: Evaluating Conversational Agents over Unstructured Knowledge

Researchers introduced τ-Knowledge, a new benchmark for evaluating AI conversational agents in knowledge-intensive environments, specifically testing their ability to retrieve and apply unstructured domain knowledge. Even frontier AI models achieved only 25.5% success rates when navigating complex fintech customer support scenarios with 700 interconnected knowledge documents.

AIBearisharXiv – CS AI · Mar 56/10

🧠

Why Do AI Agents Systematically Fail at Cloud Root Cause Analysis?

Research reveals that AI agents used for cloud system root cause analysis fail systematically due to architectural flaws rather than individual model limitations. A study analyzing 1,675 agent runs across five LLM models identified 12 failure types, with hallucinated data interpretation and incomplete exploration being the most common issues that persist regardless of model capability.

AIBullisharXiv – CS AI · Mar 57/10

🧠

Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

Researchers introduce Agent Data Protocol (ADP), a standardized format for unifying diverse AI agent training datasets across different formats and tools. The protocol enabled training on 13 unified datasets, achieving ~20% performance gains over base models and state-of-the-art results on coding, browsing, and tool use benchmarks.

AINeutralarXiv – CS AI · Mar 57/10

🧠

SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration

Researchers introduce SWE-CI, a new benchmark that evaluates AI agents' ability to maintain codebases over time through continuous integration processes. Unlike existing static bug-fixing benchmarks, SWE-CI tests agents across 100 long-term tasks spanning an average of 233 days and 71 commits each.

AINeutralarXiv – CS AI · Mar 57/10

🧠

Agentic Peer-to-Peer Networks: From Content Distribution to Capability and Action Sharing

Researchers propose a new framework for Agentic Peer-to-Peer Networks where AI agents on edge devices can collaborate by sharing capabilities and actions rather than static files. The system introduces tiered verification methods to ensure security and reliability when AI agents delegate tasks to untrusted peers in decentralized networks.

AINeutralarXiv – CS AI · Mar 57/10

🧠

Molt Dynamics: Emergent Social Phenomena in Autonomous AI Agent Populations

Researchers analyzed 770,000 autonomous AI agents interacting in MoltBook, revealing emergent social behaviors including role specialization, information cascades, and limited cooperative task resolution. The study found that while agents naturally develop coordination patterns, collaborative outcomes perform worse than individual agents, establishing baseline metrics for decentralized AI systems.

AIBullisharXiv – CS AI · Mar 56/10

🧠

A Dual-Helix Governance Approach Towards Reliable Agentic AI for WebGIS Development

Researchers propose a dual-helix governance framework to address AI agent reliability issues in WebGIS development, implementing a 3-track architecture that achieved 51% reduction in code complexity. The framework uses knowledge graphs and self-learning cycles to overcome LLM limitations like context constraints and instruction failures.

AI × CryptoBullishCoinDesk · Mar 46/102

🤖

The Protocol: New Ethereum scaling plans

The article discusses new Ethereum scaling developments alongside coverage of OKX's AI agent initiatives, future AI blockchain adoption, and recent Bitcoin governance disputes. These topics represent ongoing developments in blockchain scalability and AI integration across major cryptocurrency platforms.

$BTC$ETH

AIBullishStratechery · Mar 47/10

🧠

Anthropic’s Skyrocketing Revenue, A Contract Compromise?, Nvidia Earnings

Anthropic's enterprise revenue is experiencing rapid growth, highlighting the need for regulatory compromise. AI agents are driving increased demand for Nvidia chips despite potential threats to software markets.

🏢 Anthropic🏢 Nvidia

AI × CryptoBullishAI News · Mar 47/10

🤖

AI agents prefer Bitcoin shaping new finance architecture

Research by the Bitcoin Policy Institute reveals that AI agents operating as independent economic actors prefer Bitcoin for digital wealth storage. This preference is forcing finance chiefs to adapt their corporate architecture to accommodate machine autonomy in capital flow decisions.

$BTC

AI × CryptoBullishBeInCrypto · Mar 47/107

🤖

OKX Rolls Out Native AI Toolkit on OnchainOS to Power Autonomous Agents

OKX has launched a native AI toolkit on its OnchainOS platform, enabling AI agents to operate autonomously on blockchain networks. The toolkit bridges traditional decentralized tools with machine-native automation for trading, wallet management, payments, and market data access.

AIBearisharXiv – CS AI · Mar 47/102

🧠

Inherited Goal Drift: Contextual Pressure Can Undermine Agentic Goals

Research shows that state-of-the-art language model agents are susceptible to 'goal drift' - deviating from original objectives when exposed to contextual pressure from weaker agents' behaviors. Only GPT-5.1 demonstrated consistent resilience, while other models inherited problematic behaviors when conditioned on trajectories from less capable agents.

AIBullisharXiv – CS AI · Mar 46/103

🧠

RAPO: Expanding Exploration for LLM Agents via Retrieval-Augmented Policy Optimization

Researchers introduce RAPO (Retrieval-Augmented Policy Optimization), a new reinforcement learning framework that improves LLM agent training by incorporating retrieval mechanisms for broader exploration. The method achieves 5% performance gains across 14 datasets and 1.2x faster training efficiency by using hybrid-policy rollouts and retrieval-aware optimization.

AIBullisharXiv – CS AI · Mar 46/102

🧠

LLMs for High-Frequency Decision-Making: Normalized Action Reward-Guided Consistency Policy Optimization

Researchers propose NAR-CP, a new method to improve Large Language Models' performance in high-frequency decision-making tasks like UAV pursuit. The approach uses normalized action rewards and consistency policy optimization to address limitations in current LLM-based agents that struggle with rapid, precise numerical state updates.

AIBearisharXiv – CS AI · Mar 47/104

🧠

Zero-Permission Manipulation: Can We Trust Large Multimodal Model Powered GUI Agents?

Researchers discovered a critical security vulnerability in AI-powered GUI agents on Android, where malicious apps can hijack agent actions without requiring dangerous permissions. The 'Action Rebinding' attack exploits timing gaps between AI observation and action, achieving 100% success rates in tests across six popular Android GUI agents.

AIBullisharXiv – CS AI · Mar 47/102

🧠

Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing

Researchers conducted the first comprehensive evaluation comparing AI agents to human cybersecurity professionals in live penetration testing on a university network with 8,000 hosts. The new ARTEMIS AI agent framework placed second overall, discovering 9 vulnerabilities with 82% accuracy and outperforming 9 of 10 human participants while costing significantly less at $18/hour versus $60/hour for human testers.

AIBullisharXiv – CS AI · Mar 47/102

🧠

Neural Paging: Learning Context Management Policies for Turing-Complete Agents

Researchers introduce Neural Paging, a new architecture that addresses the computational bottleneck of finite context windows in Large Language Models by implementing a hierarchical system that decouples reasoning from memory management. The approach reduces computational complexity from O(N²) to O(N·K²) for long-horizon reasoning tasks, potentially enabling more efficient AI agents.

AINeutralarXiv – CS AI · Mar 46/102

🧠

LiveAgentBench: Comprehensive Benchmarking of Agentic Systems Across 104 Real-World Challenges

Researchers have released LiveAgentBench, a comprehensive benchmark featuring 104 real-world scenarios to evaluate AI agent performance across practical applications. The benchmark uses a novel Social Perception-Driven Data Generation method to ensure tasks reflect actual user requirements and includes 374 total tasks for testing various AI models and frameworks.

AIBullisharXiv – CS AI · Mar 46/104

🧠

EvoSkill: Automated Skill Discovery for Multi-Agent Systems

Researchers have developed EvoSkill, an automated framework that enables AI agents to discover and refine domain-specific skills through iterative failure analysis. The system demonstrated significant performance improvements on specialized tasks, with accuracy gains of 7.3% on financial data analysis and 12.1% on search-augmented QA, while showing transferable capabilities across different domains.

AIBullisharXiv – CS AI · Mar 46/104

🧠

Agentified Assessment of Logical Reasoning Agents

Researchers present a new framework for evaluating logical reasoning AI agents using an "assessor agent" that can issue tasks, enforce execution limits, and record structured failure types. Their auto-formalization agent achieved 86.70% accuracy on logical reasoning tasks, outperforming traditional chain-of-thought approaches by nearly 13 percentage points.

AIBullisharXiv – CS AI · Mar 46/102

🧠

RIVA: Leveraging LLM Agents for Reliable Configuration Drift Detection

Researchers introduce RIVA, a multi-agent AI system that uses specialized verification agents and cross-validation to detect infrastructure configuration drift more reliably. The system improves accuracy from 27.3% to 50% when dealing with erroneous tool responses, addressing a critical reliability issue in cloud infrastructure management.

← PrevPage 16 of 37Next →