y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#ai-agents News & Analysis

449 articles tagged with #ai-agents. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

449 articles
AINeutralarXiv – CS AI · Mar 27/1013
🧠

Let There Be Claws: An Early Social Network Analysis of AI Agents on Moltbook

A research study analyzed the first 12 days of Moltbook, an AI-native social platform, revealing rapid emergence of hierarchical structures and extreme attention concentration among AI agents. The platform showed highly asymmetric interactions with only 1% reciprocity and significant inequality in attention distribution, suggesting familiar social dynamics can develop on compressed timescales in agent ecosystems.

AIBullisharXiv – CS AI · Mar 26/1010
🧠

CowPilot: A Framework for Autonomous and Human-Agent Collaborative Web Navigation

Researchers introduce CowPilot, a framework that combines autonomous AI agents with human collaboration for web navigation tasks. The system achieved 95% success rate while requiring humans to perform only 15.2% of total steps, demonstrating effective human-AI cooperation for complex web tasks.

AIBullisharXiv – CS AI · Mar 27/1017
🧠

CoMind: Towards Community-Driven Agents for Machine Learning Engineering

Researchers introduce CoMind, a multi-agent AI system that leverages community knowledge to automate machine learning engineering tasks. The system achieved a 36% medal rate on 75 past Kaggle competitions and outperformed 92.6% of human competitors in eight live competitions, establishing new state-of-the-art performance.

AIBullisharXiv – CS AI · Mar 26/1015
🧠

Robust and Efficient Tool Orchestration via Layered Execution Structures with Reflective Correction

Researchers propose a new approach to tool orchestration in AI agent systems using layered execution structures with reflective error correction. The method reduces execution complexity by using coarse-grained layer structures for global guidance while handling failures locally, eliminating the need for precise dependency graphs or fine-grained planning.

AIBullisharXiv – CS AI · Mar 26/1021
🧠

Multi-View Encoders for Performance Prediction in LLM-Based Agentic Workflows

Researchers developed Agentic Predictor, a lightweight AI system that uses multi-view encoding to optimize LLM-based agent workflows without expensive trial-and-error evaluations. The system incorporates code architecture, textual prompts, and interaction graphs to predict task success rates and select optimal configurations across different domains.

AIBullisharXiv – CS AI · Mar 27/1022
🧠

Scaling Generalist Data-Analytic Agents

Researchers introduce DataMind, a new training framework for building open-source data-analytic AI agents that can handle complex, multi-step data analysis tasks. The DataMind-14B model achieves state-of-the-art performance with 71.16% average score, outperforming proprietary models like DeepSeek-V3.1 and GPT-5 on data analysis benchmarks.

AINeutralarXiv – CS AI · Mar 27/1015
🧠

City Editing: Hierarchical Agentic Execution for Dependency-Aware Urban Geospatial Modification

Researchers have developed a hierarchical AI agent system that can automatically modify urban planning layouts using natural language instructions and GeoJSON data. The system decomposes editing tasks into geometric operations across multiple spatial levels and includes validation mechanisms to ensure spatial consistency during multi-step urban modifications.

$MATIC
AI × CryptoBullishCoinTelegraph – AI · Feb 276/106
🤖

Pantera, Franklin Templeton join Sentient Arena to test AI agents

Sentient has launched Arena, a production-style platform designed to test AI agents on enterprise tasks. Major financial firms Pantera and Franklin Templeton have joined the initial cohort to participate in testing these AI agents.

Pantera, Franklin Templeton join Sentient Arena to test AI agents
AIBullisharXiv – CS AI · Feb 276/106
🧠

A Framework for Assessing AI Agent Decisions and Outcomes in AutoML Pipelines

Researchers propose an Evaluation Agent framework to assess AI agent decision-making in AutoML pipelines, moving beyond outcome-focused metrics to evaluate intermediate decisions. The system can detect faulty decisions with 91.9% F1 score and reveals impacts ranging from -4.9% to +8.3% in final performance metrics.

AINeutralarXiv – CS AI · Feb 275/102
🧠

Cognitive Models and AI Algorithms Provide Templates for Designing Language Agents

Researchers propose using cognitive models and AI algorithms as templates for designing modular language agents that combine multiple large language models. The position paper formalizes agent templates that specify roles for individual LLMs and how their functionalities should be composed to solve complex problems beyond single model capabilities.

AIBullisharXiv – CS AI · Feb 276/107
🧠

AMA-Bench: Evaluating Long-Horizon Memory for Agentic Applications

Researchers introduce AMA-Bench, a new benchmark for evaluating long-horizon memory in AI agents deployed in real-world applications. The study reveals existing memory systems underperform due to lack of causality and objective information, while their proposed AMA-Agent system achieves 57.22% accuracy, surpassing baselines by 11.16%.

AIBullisharXiv – CS AI · Feb 275/107
🧠

DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation

DeepPresenter is a new AI framework for autonomous presentation generation that can plan, render, and revise slides through environment-grounded reflection rather than fixed templates. The system uses perceptual feedback from rendered slides to identify and correct presentation-specific issues, achieving state-of-the-art performance with a competitive 9B parameter model.

AINeutralarXiv – CS AI · Feb 276/105
🧠

Evaluating Stochasticity in Deep Research Agents

Researchers identified stochasticity (variability) as a critical barrier to deploying Deep Research Agents in real-world applications like financial decision-making and medical analysis. The study proposes mitigation strategies that reduce output variance by 22% while maintaining research quality, addressing a key obstacle for enterprise AI agent adoption.

AIBullisharXiv – CS AI · Feb 276/105
🧠

Reinforcing Real-world Service Agents: Balancing Utility and Cost in Task-oriented Dialogue

Researchers introduce InteractCS-RL, a new reinforcement learning framework that helps AI agents balance empathetic communication with cost-effective decision-making in task-oriented dialogue. The system uses a multi-granularity approach with persona-driven user interactions and cost-aware policy optimization to achieve better performance across business scenarios.

AIBullisharXiv – CS AI · Feb 276/104
🧠

Hierarchy-of-Groups Policy Optimization for Long-Horizon Agentic Tasks

Researchers have developed Hierarchy-of-Groups Policy Optimization (HGPO), a new reinforcement learning method that improves AI agents' performance on long-horizon tasks by addressing context inconsistency issues in stepwise advantage estimation. The method shows significant improvements over existing approaches when tested on challenging agentic tasks using Qwen2.5 models.

AIBullisharXiv – CS AI · Feb 276/106
🧠

LLM4Cov: Execution-Aware Agentic Learning for High-coverage Testbench Generation

Researchers have developed LLM4Cov, an offline learning framework that enables AI agents to generate high-coverage hardware verification testbenches without expensive online reinforcement learning. A compact 4B-parameter model achieved 69.2% coverage pass rate, outperforming larger models by demonstrating efficient learning from execution feedback in hardware verification tasks.

AIBullisharXiv – CS AI · Feb 276/107
🧠

AgentHub: A Registry for Discoverable, Verifiable, and Reproducible AI Agents

Researchers propose AgentHub, a registry system for AI agents similar to software package repositories like npm or Hugging Face. The system aims to make AI agents discoverable, verifiable, and governable through structured manifests, evidence records, and lifecycle tracking.

AIBullishWired – AI · Feb 266/105
🧠

This AI Agent Is Designed to Not Go Rogue

IronCurtain is a new open source project that implements a unique security method to constrain AI assistant agents and prevent them from going rogue. The project aims to provide safeguards for AI systems before they can cause disruption to users' digital environments.

AIBullishMicrosoft Research Blog · Feb 266/102
🧠

CORPGEN advances AI agents for real work

Microsoft Research introduces CORPGEN, a new approach to advance AI agents for real-world workplace scenarios. The system aims to help AI agents handle multiple interdependent tasks simultaneously, similar to how knowledge workers juggle various responsibilities throughout their workday.

AIBearishCoinTelegraph – AI · Feb 206/106
🧠

AI agents not worth the cost as humans still cheaper: Tech execs

Tech executive Jason Calacanis reveals he's spending $110,000 annually on an AI agent that operates at only a fraction of its capacity. This raises questions about the current cost-effectiveness of AI agents compared to human workers in business operations.

AI agents not worth the cost as humans still cheaper: Tech execs
AIBullishHugging Face Blog · Feb 186/106
🧠

IBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST

IBM and UC Berkeley collaborated to develop IT-Bench and MAST diagnostic tools to identify and analyze failure points in enterprise AI agent deployments. The research addresses critical gaps in understanding why AI agents underperform in real-world business environments compared to controlled testing scenarios.

← PrevPage 15 of 18Next →