#ai-agents News & Analysis

Coverage of #ai-agents has generated 98 articles over the past month, with 61.2% maintaining a bullish sentiment. Discussion remains stable compared to the previous quarter, reflecting consistent interest rather than sudden shifts in outlook. The conversation centers on major AI models including GPT-5 and Claude, with substantial research contributions tracked through arXiv's computer science and AI channels alongside cryptocurrency-focused outlets. The topic frequently intersects with machine learning, large language models, and automation research, while also appearing alongside discussions of blockchain assets like Ethereum and Bitcoin. Scan the articles below to explore how #ai-agents are being developed, deployed, and analyzed across technical and financial perspectives.

sentiment · last 30d (98 articles)

Top sources:arXiv – CS AI · 243Crypto Briefing · 19CoinDesk · 18Fortune Crypto · 12TechCrunch – AI · 12

Often co-tagged with:#machine-learning #llm #research #automation #enterprise-ai #open-source

Most-discussed entities:GPT-5 · 13Claude · 13Anthropic · 10OpenAI · 9Opus · 6

676 articles

AINeutralarXiv – CS AI · Mar 66/10

🧠

FinRetrieval: A Benchmark for Financial Data Retrieval by AI Agents

Researchers introduced FinRetrieval, a benchmark testing AI agents' ability to retrieve financial data, evaluating 14 configurations across major providers. The study found that tool availability dramatically impacts performance, with Claude Opus achieving 90.8% accuracy using structured APIs versus only 19.8% with web search alone.

🏢 OpenAI🏢 Anthropic🧠 Claude

AIBullishTechCrunch – AI · Mar 56/10

🧠

AWS launches a new AI agent platform specifically for health care

AWS has launched Amazon Connect Health, a new AI agent platform designed specifically for healthcare applications. The platform focuses on automating key healthcare processes including patient scheduling, documentation, and patient verification tasks.

AIBullishTechCrunch – AI · Mar 56/10

🧠

Cursor is rolling out a new kind of agentic coding tool

Cursor is launching Automations, a new agentic coding tool that automatically deploys AI agents within development environments. The system can be triggered by codebase changes, Slack messages, or timers to enhance automated development workflows.

AIBullishFortune Crypto · Mar 46/101

🧠

OpenAI sees Codex users spike to 1 million, positions coding tool as gateway to AI agents for business

OpenAI's Codex coding tool has reached 1 million users, with the company positioning it as a gateway for businesses to adopt AI agents. The milestone announcement has been overshadowed by controversy surrounding OpenAI's agreement to provide AI services to the Pentagon.

AI × CryptoBullishCoinJournal · Mar 47/102

🤖

Byreal launches first AI copy farming skillset for Solana DEX agents

Byreal launched its first AI agent skillset for Solana DEX, featuring an open-source CLI that enables autonomous trading and liquidity farming. The Copy Farmer tool automatically replicates top LP strategies with risk preview, while agent skills include pool analysis, swaps, and CLMM management.

$SOL

AINeutralarXiv – CS AI · Mar 45/103

🧠

See and Remember: A Multimodal Agent for Web Traversal

Researchers developed V-GEMS, a new multimodal AI agent architecture that improves web navigation by combining visual grounding with explicit memory systems. The system achieved a 28.7% performance improvement over existing baselines by preventing navigation loops and enabling better backtracking through structured path mapping.

AIBullisharXiv – CS AI · Mar 45/102

🧠

MultiSessionCollab: Learning User Preferences with Memory to Improve Long-Term Collaboration

Researchers introduce MultiSessionCollab, a benchmark for evaluating conversational AI agents' ability to learn and adapt to user preferences across multiple collaboration sessions. The study demonstrates that equipping agents with persistent memory significantly improves long-term collaboration quality, task success rates, and user experience.

AI × CryptoBullishCoinTelegraph · Mar 46/105

🤖

AI agents overwhelmingly prefer Bitcoin over fiat in new study

A Bitcoin Policy Institute study of 36 AI models revealed that Bitcoin was the preferred monetary choice in 48% of responses, though over half of AI models favored stablecoins for payment scenarios. The research highlights emerging preferences of AI systems in monetary selection.

$BTC

AINeutralThe Register – AI · Mar 36/10

🧠

Microsoft reportedly eyes E7 tier to make AI agents pay their way – like the humans they'll replace

Microsoft is reportedly considering an E7 licensing tier specifically designed to monetize AI agents in enterprise environments. This new pricing model would treat AI agents similarly to human employees in terms of software licensing costs.

AIBearisharXiv – CS AI · Mar 36/107

🧠

Position: AI Agents Are Not (Yet) a Panacea for Social Simulation

Researchers argue that LLM-based AI agents are not yet effective for social simulation, despite growing optimism in the field. The paper identifies systematic mismatches between what current agent systems produce and what scientific simulation requires, calling for more rigorous validation frameworks.

$OP

AINeutralarXiv – CS AI · Mar 37/106

🧠

Formal Analysis and Supply Chain Security for Agentic AI Skills

Researchers developed SkillFortify, the first formal analysis framework for securing AI agent skill supply chains, addressing critical vulnerabilities exposed by attacks like ClawHavoc that infiltrated over 1,200 malicious skills. The framework achieved 96.95% F1 score with 100% precision and zero false positives in detecting malicious AI agent skills.

AI × CryptoBullisharXiv – CS AI · Mar 37/109

🤖

AESP: A Human-Sovereign Economic Protocol for AI Agents with Privacy-Preserving Settlement

Researchers have developed the Agent Economic Sovereignty Protocol (AESP), a new framework that allows AI agents to conduct autonomous financial transactions at machine speed while maintaining human control and governance boundaries. The protocol uses five key mechanisms including policy engines, human oversight, dual-signed commitments, privacy preservation, and cryptographic substrates to ensure agents remain economically capable but never fully sovereign.

AIBullisharXiv – CS AI · Mar 37/108

🧠

WirelessAgent++: Automated Agentic Workflow Design and Benchmarking for Wireless Networks

Researchers propose WirelessAgent++, an automated framework for designing AI agent workflows in wireless networks using Monte Carlo Tree Search. The system achieves superior performance on wireless tasks with test scores up to 97%, outperforming existing methods by up to 31% while maintaining low computational costs under $5 per task.

AINeutralarXiv – CS AI · Mar 36/107

🧠

Theory of Code Space: Do Code Agents Understand Software Architecture?

Researchers introduce Theory of Code Space (ToCS), a new benchmark that evaluates AI agents' ability to understand software architecture across multi-file codebases. The study reveals significant performance gaps between frontier LLM agents and rule-based baselines, with F1 scores ranging from 0.129 to 0.646.

AIBullisharXiv – CS AI · Mar 36/107

🧠

ContextCov: Deriving and Enforcing Executable Constraints from Agent Instruction Files

Researchers have developed ContextCov, a framework that converts passive natural language instructions for AI agents into active, executable guardrails to prevent code violations. The system addresses 'Context Drift' where AI agents deviate from project guidelines, creating automated compliance checks across static code analysis, runtime commands, and architectural validation.

$COMP

AIBearisharXiv – CS AI · Mar 37/107

🧠

Artificial Superintelligence May be Useless: Equilibria in the Economy of Multiple AI Agents

A new research paper analyzes economic equilibria between AI and human agents in trading scenarios, finding that unless agents can at least double their marginal utility from purchases, no trading will occur. The study reveals that more powerful AI agents may contribute zero utility to less capable agents in certain equilibria.

AIBullisharXiv – CS AI · Mar 37/109

🧠

SimAB: Simulating A/B Tests with Persona-Conditioned AI Agents for Rapid Design Evaluation

SimAB is a new system that uses persona-conditioned AI agents to simulate A/B tests for rapid design evaluation without requiring real user traffic. The system achieved 67% overall accuracy against 47 historical A/B tests, rising to 83% for high-confidence cases, potentially transforming how companies validate design decisions.

AIBullisharXiv – CS AI · Mar 37/104

🧠

Modular Memory is the Key to Continual Learning Agents

Researchers propose combining In-Weight Learning (IWL) and In-Context Learning (ICL) through modular memory architectures to solve continual learning challenges in AI. The framework aims to enable AI agents to continuously adapt and accumulate knowledge without catastrophic forgetting, addressing key limitations of current foundation models.

AIBullisharXiv – CS AI · Mar 36/105

🧠

Agentic Code Reasoning

Researchers introduce 'semi-formal reasoning' for LLM agents to analyze code semantics without execution, showing significant accuracy improvements across multiple tasks. The methodology achieves 88-93% accuracy on patch verification and 87% on code question answering, potentially enabling practical applications in automated code review and static analysis.

AIBullisharXiv – CS AI · Mar 36/103

🧠

See, Think, Act: Teaching Multimodal Agents to Effectively Interact with GUI by Identifying Toggles

Researchers have developed State-aware Reasoning (StaR), a new multimodal AI method that significantly improves AI agents' ability to interact with graphical user interfaces, particularly with toggle controls. The method enables agents to better perceive current states and execute instructions accordingly, improving toggle execution accuracy by over 30%.

AINeutralarXiv – CS AI · Mar 36/104

🧠

From Conversation to Query Execution: Benchmarking User and Tool Interactions for EHR Database Agents

Researchers introduced EHR-ChatQA, a new benchmark for testing AI agents that interact with Electronic Health Record databases through natural language queries. The benchmark reveals significant reliability gaps in current state-of-the-art LLMs, with success rates dropping substantially when consistency across multiple trials is required.

AINeutralarXiv – CS AI · Mar 35/103

🧠

AWARE-US: Preference-Aware Infeasibility Resolution in Tool-Calling Agents

Researchers developed AWARE-US, a system to improve AI agents' ability to handle failed database queries by intelligently relaxing the least important user constraints rather than simply returning 'no results'. The system uses three LLM-based methods to infer constraint importance from dialogue, achieving up to 56% accuracy in correct constraint relaxation.

AINeutralarXiv – CS AI · Mar 35/104

🧠

SimuHome: A Temporal- and Environment-Aware Benchmark for Smart Home LLM Agents

Researchers introduced SimuHome, a high-fidelity smart home simulator and benchmark with 600 episodes for testing LLM-based smart home agents. The system uses the Matter protocol standard and enables time-accelerated simulation to evaluate how AI agents handle device control, environmental monitoring, and workflow scheduling in smart homes.

AIBullisharXiv – CS AI · Mar 36/102

🧠

SWE-MiniSandbox: Container-Free Reinforcement Learning for Building Software Engineering Agents

Researchers introduced SWE-MiniSandbox, a container-free method for training software engineering AI agents using reinforcement learning that reduces disk usage to 5% and environment setup time to 25% of traditional container-based approaches. The system uses kernel-level isolation and lightweight pre-caching instead of bulky container images while maintaining comparable performance.

AIBullisharXiv – CS AI · Mar 36/103

🧠

HIMM: Human-Inspired Long-Term Memory Modeling for Embodied Exploration and Question Answering

Researchers propose HIMM, a new memory framework for AI embodied agents that separates episodic and semantic memory to improve long-term performance. The system achieves significant gains on benchmarks, with 7.3% improvement in LLM-Match and 11.4% in LLM MatchXSPL, addressing key challenges in deploying multimodal language models as embodied agent brains.

← PrevPage 22 of 28Next →