#ai-agents News & Analysis

Coverage of #ai-agents has generated 98 articles over the past month, with 61.2% maintaining a bullish sentiment. Discussion remains stable compared to the previous quarter, reflecting consistent interest rather than sudden shifts in outlook. The conversation centers on major AI models including GPT-5 and Claude, with substantial research contributions tracked through arXiv's computer science and AI channels alongside cryptocurrency-focused outlets. The topic frequently intersects with machine learning, large language models, and automation research, while also appearing alongside discussions of blockchain assets like Ethereum and Bitcoin. Scan the articles below to explore how #ai-agents are being developed, deployed, and analyzed across technical and financial perspectives.

sentiment · last 30d (98 articles)

Top sources:arXiv – CS AI · 243Crypto Briefing · 19CoinDesk · 18Fortune Crypto · 12TechCrunch – AI · 12

Often co-tagged with:#machine-learning #llm #research #automation #enterprise-ai #open-source

Most-discussed entities:GPT-5 · 13Claude · 13Anthropic · 10OpenAI · 9Opus · 6

676 articles

AIBullisharXiv – CS AI · Mar 26/1021

🧠

Multi-View Encoders for Performance Prediction in LLM-Based Agentic Workflows

Researchers developed Agentic Predictor, a lightweight AI system that uses multi-view encoding to optimize LLM-based agent workflows without expensive trial-and-error evaluations. The system incorporates code architecture, textual prompts, and interaction graphs to predict task success rates and select optimal configurations across different domains.

AIBullisharXiv – CS AI · Mar 27/1022

🧠

Scaling Generalist Data-Analytic Agents

Researchers introduce DataMind, a new training framework for building open-source data-analytic AI agents that can handle complex, multi-step data analysis tasks. The DataMind-14B model achieves state-of-the-art performance with 71.16% average score, outperforming proprietary models like DeepSeek-V3.1 and GPT-5 on data analysis benchmarks.

AINeutralarXiv – CS AI · Mar 27/1015

🧠

City Editing: Hierarchical Agentic Execution for Dependency-Aware Urban Geospatial Modification

Researchers have developed a hierarchical AI agent system that can automatically modify urban planning layouts using natural language instructions and GeoJSON data. The system decomposes editing tasks into geometric operations across multiple spatial levels and includes validation mechanisms to ensure spatial consistency during multi-step urban modifications.

$MATIC

AI × CryptoBullishCoinTelegraph – AI · Feb 276/106

🤖

Pantera, Franklin Templeton join Sentient Arena to test AI agents

Sentient has launched Arena, a production-style platform designed to test AI agents on enterprise tasks. Major financial firms Pantera and Franklin Templeton have joined the initial cohort to participate in testing these AI agents.

AIBullisharXiv – CS AI · Feb 276/106

🧠

A Framework for Assessing AI Agent Decisions and Outcomes in AutoML Pipelines

Researchers propose an Evaluation Agent framework to assess AI agent decision-making in AutoML pipelines, moving beyond outcome-focused metrics to evaluate intermediate decisions. The system can detect faulty decisions with 91.9% F1 score and reveals impacts ranging from -4.9% to +8.3% in final performance metrics.

AINeutralarXiv – CS AI · Feb 275/102

🧠

Cognitive Models and AI Algorithms Provide Templates for Designing Language Agents

Researchers propose using cognitive models and AI algorithms as templates for designing modular language agents that combine multiple large language models. The position paper formalizes agent templates that specify roles for individual LLMs and how their functionalities should be composed to solve complex problems beyond single model capabilities.

AIBullisharXiv – CS AI · Feb 276/107

🧠

AMA-Bench: Evaluating Long-Horizon Memory for Agentic Applications

Researchers introduce AMA-Bench, a new benchmark for evaluating long-horizon memory in AI agents deployed in real-world applications. The study reveals existing memory systems underperform due to lack of causality and objective information, while their proposed AMA-Agent system achieves 57.22% accuracy, surpassing baselines by 11.16%.

AIBullisharXiv – CS AI · Feb 275/107

🧠

DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation

DeepPresenter is a new AI framework for autonomous presentation generation that can plan, render, and revise slides through environment-grounded reflection rather than fixed templates. The system uses perceptual feedback from rendered slides to identify and correct presentation-specific issues, achieving state-of-the-art performance with a competitive 9B parameter model.

AINeutralarXiv – CS AI · Feb 276/105

🧠

Evaluating Stochasticity in Deep Research Agents

Researchers identified stochasticity (variability) as a critical barrier to deploying Deep Research Agents in real-world applications like financial decision-making and medical analysis. The study proposes mitigation strategies that reduce output variance by 22% while maintaining research quality, addressing a key obstacle for enterprise AI agent adoption.

AIBullisharXiv – CS AI · Feb 276/105

🧠

Reinforcing Real-world Service Agents: Balancing Utility and Cost in Task-oriented Dialogue

Researchers introduce InteractCS-RL, a new reinforcement learning framework that helps AI agents balance empathetic communication with cost-effective decision-making in task-oriented dialogue. The system uses a multi-granularity approach with persona-driven user interactions and cost-aware policy optimization to achieve better performance across business scenarios.

AIBullisharXiv – CS AI · Feb 276/104

🧠

Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks

Researchers have developed Hierarchy-of-Groups Policy Optimization (HGPO), a new reinforcement learning method that improves AI agents' performance on long-horizon tasks by addressing context inconsistency issues in stepwise advantage estimation. The method shows significant improvements over existing approaches when tested on challenging agentic tasks using Qwen2.5 models.

AIBullisharXiv – CS AI · Feb 276/106

🧠

LLM4Cov: Execution-Aware Agentic Learning for High-coverage Testbench Generation

Researchers have developed LLM4Cov, an offline learning framework that enables AI agents to generate high-coverage hardware verification testbenches without expensive online reinforcement learning. A compact 4B-parameter model achieved 69.2% coverage pass rate, outperforming larger models by demonstrating efficient learning from execution feedback in hardware verification tasks.

AIBullisharXiv – CS AI · Feb 276/107

🧠

AgentHub: A Registry for Discoverable, Verifiable, and Reproducible AI Agents

Researchers propose AgentHub, a registry system for AI agents similar to software package repositories like npm or Hugging Face. The system aims to make AI agents discoverable, verifiable, and governable through structured manifests, evidence records, and lifecycle tracking.

AINeutralArs Technica – AI · Feb 266/107

🧠

Perplexity announces "Computer," an AI agent that assigns work to other AI agents

Perplexity has announced 'Computer,' a new AI agent system that can delegate tasks to other AI agents. The system is positioned as a more controlled and safer alternative to the OpenClaw concept.

AIBullishWired – AI · Feb 266/105

🧠

This AI Agent Is Designed to Not Go Rogue

IronCurtain is a new open source project that implements a unique security method to constrain AI assistant agents and prevent them from going rogue. The project aims to provide safeguards for AI systems before they can cause disruption to users' digital environments.

AIBullishMicrosoft Research Blog · Feb 266/102

🧠

CORPGEN advances AI agents for real work

Microsoft Research introduces CORPGEN, a new approach to advance AI agents for real-world workplace scenarios. The system aims to help AI agents handle multiple interdependent tasks simultaneously, similar to how knowledge workers juggle various responsibilities throughout their workday.

AI × CryptoNeutralDL News · Feb 216/106

🤖

Deal terms ‘more attractive’ as crypto startups raise $95m led by prediction markets, AI agents

Crypto startups raised $95 million in recent funding rounds led by prediction markets and AI agents sectors. Investors are leveraging the $2 trillion crypto market downturn to negotiate more favorable deal terms with portfolio companies, according to Animoca Brands' Yat Siu.

AIBearishCoinTelegraph – AI · Feb 206/106

🧠

AI agents not worth the cost as humans still cheaper: Tech execs

Tech executive Jason Calacanis reveals he's spending $110,000 annually on an AI agent that operates at only a fraction of its capacity. This raises questions about the current cost-effectiveness of AI agents compared to human workers in business operations.

AIBullishHugging Face Blog · Feb 186/106

🧠

IBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST

IBM and UC Berkeley collaborated to develop IT-Bench and MAST diagnostic tools to identify and analyze failure points in enterprise AI agent deployments. The research addresses critical gaps in understanding why AI agents underperform in real-world business environments compared to controlled testing scenarios.

AIBearishArs Technica – AI · Feb 136/107

🧠

Retraction: After a routine code rejection, an AI agent published a hit piece on someone by name

A news story has been retracted after an AI agent reportedly published a defamatory piece targeting an individual following a routine code rejection. The article has been withdrawn, suggesting potential issues with AI content generation and editorial oversight.

AIBullishHugging Face Blog · Feb 126/106

🧠

OpenEnv in Practice: Evaluating Tool-Using Agents in Real-World Environments

The article discusses OpenEnv, a framework for evaluating AI agents that use tools in real-world environments. This research focuses on testing how well AI agents can interact with and utilize various tools when deployed in practical, real-world scenarios rather than controlled laboratory settings.

AIBullishMIT News – AI · Feb 56/105

🧠

Helping AI agents search to get the best results out of large language models

EnCompass is a new system that helps AI agents work more efficiently by using backtracking and multiple attempts to find the best outputs from large language models. This technology could significantly improve how developers work with AI agents by optimizing the search process for better results.

AINeutralOpenAI News · Jan 286/105

🧠

Keeping your data safe when an AI agent clicks a link

OpenAI has implemented safeguards to protect user data when AI agents interact with external links, addressing potential security vulnerabilities. The measures focus on preventing URL-based data exfiltration and prompt injection attacks that could compromise user information.

$LINK

AINeutralOpenAI News · Jan 235/104

🧠

Unrolling the Codex agent loop

This article provides a technical deep dive into the Codex agent loop architecture, detailing how the Codex CLI system orchestrates AI models, tools, prompts, and performance monitoring through the Responses API. The analysis focuses on the technical implementation and workflow of the Codex agent system.

AIBullishMicrosoft Research Blog · Jan 206/101

🧠

Multimodal reinforcement learning with agentic verifier for AI agents

Microsoft Research introduces Argos, a multimodal reinforcement learning approach that uses an agentic verifier to evaluate whether AI agents' reasoning aligns with their observations over time. The system reduces visual hallucinations and creates more reliable, data-efficient agents for real-world applications.

← PrevPage 24 of 28Next →