#ai-agents News & Analysis

Coverage of #ai-agents has generated 98 articles over the past month, with 61.2% maintaining a bullish sentiment. Discussion remains stable compared to the previous quarter, reflecting consistent interest rather than sudden shifts in outlook. The conversation centers on major AI models including GPT-5 and Claude, with substantial research contributions tracked through arXiv's computer science and AI channels alongside cryptocurrency-focused outlets. The topic frequently intersects with machine learning, large language models, and automation research, while also appearing alongside discussions of blockchain assets like Ethereum and Bitcoin. Scan the articles below to explore how #ai-agents are being developed, deployed, and analyzed across technical and financial perspectives.

sentiment · last 30d (98 articles)

Top sources:arXiv – CS AI · 243Crypto Briefing · 19CoinDesk · 18Fortune Crypto · 12TechCrunch – AI · 12

Often co-tagged with:#machine-learning #llm #research #automation #enterprise-ai #open-source

Most-discussed entities:GPT-5 · 13Claude · 13Anthropic · 10OpenAI · 9Opus · 6

676 articles

AI × CryptoBearisharXiv – CS AI · Mar 36/108

🤖

TraderBench: How Robust Are AI Agents in Adversarial Capital Markets?

TraderBench introduces a new benchmark for evaluating AI agents in financial markets, combining expert-verified static tasks with adversarial trading simulations. The study found that 8 of 13 tested AI models showed minimal variation across market conditions, indicating they rely on fixed strategies rather than adaptive market behavior.

AIBullisharXiv – CS AI · Mar 37/108

🧠

DenoiseFlow: Uncertainty-Aware Denoising for Reliable LLM Agentic Workflows

Researchers introduce DenoiseFlow, a framework that addresses reliability issues in AI agent workflows by managing uncertainty through adaptive computation allocation and error correction. The system achieves 83.3% average accuracy across benchmarks while reducing computational costs by 40-56% through intelligent branching decisions.

$COMP

AIBullisharXiv – CS AI · Mar 36/107

🧠

SWE-Hub: A Unified Production System for Scalable, Executable Software Engineering Tasks

Researchers introduce SWE-Hub, a comprehensive system for generating scalable, executable software engineering tasks for training AI agents. The platform addresses current limitations in AI software development by providing unified environment automation, bug synthesis, and diverse task generation across multiple programming languages.

AIBullisharXiv – CS AI · Mar 36/109

🧠

TraceSIR: A Multi-Agent Framework for Structured Analysis and Reporting of Agentic Execution Traces

Researchers introduce TraceSIR, a multi-agent framework that analyzes execution traces from AI agentic systems to diagnose failures and optimize performance. The system uses three specialized agents to compress traces, identify issues, and generate comprehensive analysis reports, significantly outperforming existing approaches in evaluation tests.

AIBullisharXiv – CS AI · Mar 36/108

🧠

InfoPO: Information-Driven Policy Optimization for User-Centric Agents

Researchers introduce InfoPO (Information-Driven Policy Optimization), a new method that improves AI agent interactions by using information-gain rewards to identify valuable conversation turns. The approach addresses credit assignment problems in multi-turn interactions and outperforms existing baselines across diverse tasks including intent clarification and collaborative coding.

AIBullisharXiv – CS AI · Mar 36/109

🧠

K^2-Agent: Co-Evolving Know-What and Know-How for Hierarchical Mobile Device Control

Researchers introduce K²-Agent, a hierarchical AI framework for mobile device control that separates 'know-what' and 'know-how' knowledge to achieve 76.1% success rate on AndroidWorld benchmark. The system uses a high-level reasoner for task planning and low-level executor for skill execution, showing strong generalization across different models and tasks.

AIBullisharXiv – CS AI · Mar 36/107

🧠

AutoSkill: Experience-Driven Lifelong Learning via Skill Self-Evolution

AutoSkill is a new framework that enables AI language models to learn and reuse personalized skills from user interactions without retraining the underlying model. The system abstracts user preferences into reusable capabilities that can be shared across different agents and tasks, addressing the current limitation where LLMs fail to retain personalized learning between sessions.

AINeutralarXiv – CS AI · Mar 37/107

🧠

How Well Does Agent Development Reflect Real-World Work?

A research study analyzing 43 AI agent benchmarks and 72,342 tasks reveals significant misalignment between current agent development efforts and real-world human work patterns across 1,016 U.S. occupations. The study finds that agent development is overly programming-centric compared to where human labor and economic value are actually concentrated in the economy.

AINeutralarXiv – CS AI · Mar 36/107

🧠

Agents Learn Their Runtime: Interpreter Persistence as Training-Time Semantics

Researchers found that AI agents perform better when their training data matches their deployment environment, specifically regarding interpreter state persistence. Models trained with persistent state but deployed in stateless environments trigger errors in 80% of cases, while the reverse wastes 3.5x more tokens through redundant computations.

AINeutralarXiv – CS AI · Mar 36/108

🧠

ASTRA-bench: Evaluating Tool-Use Agent Reasoning and Action Planning with Personal User Context

Researchers released ASTRA-bench, a new benchmark for evaluating AI agents' ability to handle complex, multi-step reasoning with personal context and tool usage. Testing revealed that current state-of-the-art models like Claude-4.5-Opus and DeepSeek-V3.2 show significant performance degradation in high-complexity scenarios.

AIBullisharXiv – CS AI · Mar 36/107

🧠

SciDER: Scientific Data-centric End-to-end Researcher

Researchers have introduced SciDER, an AI-powered system that automates the entire scientific research process from data analysis to hypothesis generation and code execution. The system uses specialized AI agents that can collaboratively process raw experimental data and outperforms existing general-purpose AI models in scientific discovery tasks.

AIBullisharXiv – CS AI · Mar 36/106

🧠

S5-HES Agent: Society 5.0-driven Agentic Framework to Democratize Smart Home Environment Simulation

Researchers have developed S5-HES Agent, an AI-driven framework that democratizes smart home research by enabling natural language configuration of simulations without programming expertise. The system uses large language models and retrieval-augmented generation to make smart home environment testing accessible to broader research communities beyond traditional technical experts.

$NEAR

AIBullisharXiv – CS AI · Mar 37/107

🧠

ToolRLA: Fine-Grained Reward Decomposition for Tool-Integrated Reinforcement Learning Alignment in Domain-Specific Agents

Researchers developed ToolRLA, a three-stage reinforcement learning pipeline that significantly improves AI agents' ability to use external tools and APIs for domain-specific tasks. The system achieved 47% higher task completion rates and 93% lower regulatory violations when deployed in a real-world financial advisory copilot serving 80+ advisors with 1,200+ daily queries.

AIBullisharXiv – CS AI · Mar 37/106

🧠

CeProAgents: A Hierarchical Agents System for Automated Chemical Process Development

Researchers propose CeProAgents, a hierarchical multi-agent system that automates chemical process development using AI agents specialized in knowledge, concept, and parameter tasks. The system introduces CeProBench, a comprehensive benchmark for evaluating AI capabilities in chemical engineering applications.

AIBullisharXiv – CS AI · Mar 37/108

🧠

FT-Dojo: Towards Autonomous LLM Fine-Tuning with Language Agents

Researchers introduce FT-Dojo, an interactive environment for studying autonomous LLM fine-tuning, along with FT-Agent, an AI system that can automatically fine-tune language models without human intervention. The system achieved best performance on 10 out of 13 tasks across five domains, demonstrating the potential for fully automated machine learning workflows while revealing current limitations in AI reasoning capabilities.

AIBullisharXiv – CS AI · Mar 36/107

🧠

CoVe: Training Interactive Tool-Use Agents via Constraint-Guided Verification

Researchers introduce CoVe, a framework for training interactive tool-use AI agents that uses constraint-guided verification to generate high-quality training data. The compact CoVe-4B model achieves competitive performance with models 17 times larger on benchmark tests, with the team open-sourcing code, models, and 12K training trajectories.

AIBullisharXiv – CS AI · Mar 36/107

🧠

ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents

Researchers propose ActMem, a novel memory framework for LLM agents that combines memory retrieval with active causal reasoning to handle complex decision-making scenarios. The framework transforms dialogue history into structured causal graphs and uses counterfactual reasoning to resolve conflicts between past states and current intentions, significantly outperforming existing baselines in memory-dependent tasks.

AI × CryptoBearishCoinTelegraph · Mar 26/108

🤖

Energym AI dystopia goes viral as crypto projects tout user-owned AI agents

A viral Black Mirror-style 'Energym' spoof depicting 80% job losses to AI is circulating amid real-world tech layoffs and declining white-collar job openings. The dystopian scenario resonates as tech companies continue mass workforce reductions while crypto projects promote user-owned AI agents as an alternative model.

AINeutralarXiv – CS AI · Mar 27/1012

🧠

An Agentic LLM Framework for Adverse Media Screening in AML Compliance

Researchers have developed an agentic LLM framework using Retrieval-Augmented Generation to automate adverse media screening for anti-money laundering compliance in financial institutions. The system addresses high false-positive rates in traditional keyword-based approaches by implementing multi-step web searches and computing Adverse Media Index scores to distinguish between high-risk and low-risk individuals.

AINeutralarXiv – CS AI · Mar 27/1013

🧠

Let There Be Claws: An Early Social Network Analysis of AI Agents on Moltbook

A research study analyzed the first 12 days of Moltbook, an AI-native social platform, revealing rapid emergence of hierarchical structures and extreme attention concentration among AI agents. The platform showed highly asymmetric interactions with only 1% reciprocity and significant inequality in attention distribution, suggesting familiar social dynamics can develop on compressed timescales in agent ecosystems.

AIBullisharXiv – CS AI · Mar 26/1013

🧠

Keyword search is all you need: Achieving RAG-Level Performance without vector databases using agentic tool use

Researchers found that simple keyword search within agentic AI frameworks can achieve over 90% of the performance of traditional RAG systems without requiring vector databases. This approach offers a more cost-effective and simpler alternative for AI applications requiring frequent knowledge base updates.

AIBullisharXiv – CS AI · Mar 26/1010

🧠

CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation

Researchers introduce CowPilot, a framework that combines autonomous AI agents with human collaboration for web navigation tasks. The system achieved 95% success rate while requiring humans to perform only 15.2% of total steps, demonstrating effective human-AI cooperation for complex web tasks.

AIBullisharXiv – CS AI · Mar 27/1017

🧠

CoMind: Towards Community-Driven Agents for Machine Learning Engineering

Researchers introduce CoMind, a multi-agent AI system that leverages community knowledge to automate machine learning engineering tasks. The system achieved a 36% medal rate on 75 past Kaggle competitions and outperformed 92.6% of human competitors in eight live competitions, establishing new state-of-the-art performance.

AIBullisharXiv – CS AI · Mar 26/1017

🧠

IntentCUA: Learning Intent-level Representations for Skill Abstraction and Multi-Agent Planning in Computer-Use Agents

Researchers introduced IntentCUA, a multi-agent framework for computer automation that achieved 74.83% task success rate through intent-aligned planning and memory systems. The system uses coordinated agents (Planner, Plan-Optimizer, and Critic) to reduce error accumulation and improve efficiency in long-horizon desktop automation tasks.

AIBullisharXiv – CS AI · Mar 26/1015

🧠

Robust and Efficient Tool Orchestration via Layered Execution Structures with Reflective Correction

Researchers propose a new approach to tool orchestration in AI agent systems using layered execution structures with reflective error correction. The method reduces execution complexity by using coarse-grained layer structures for global guidance while handling failures locally, eliminating the need for precise dependency graphs or fine-grained planning.

← PrevPage 23 of 28Next →