y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#agentic-ai News & Analysis

64 articles tagged with #agentic-ai. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

64 articles
AINeutralarXiv – CS AI · 6d ago7/10
🧠

Benchmarking LLM Tool-Use in the Wild

Researchers introduce WildToolBench, a new benchmark for evaluating large language models' ability to use tools in real-world scenarios. Testing 57 LLMs reveals that none exceed 15% accuracy, exposing significant gaps in current models' agentic capabilities when facing messy, multi-turn user interactions rather than simplified synthetic tasks.

AIBullisharXiv – CS AI · 6d ago7/10
🧠

DosimeTron: Automating Personalized Monte Carlo Radiation Dosimetry in PET/CT with Agentic AI

DosimeTron, an agentic AI system powered by GPT-5.2, automates personalized Monte Carlo radiation dosimetry calculations for PET/CT medical imaging. Validated on 597 studies across 378 patients, the system achieved 99.6% correlation with reference dosimetry calculations while processing each case in approximately 32 minutes with zero execution failures.

🧠 GPT-5
AIBearisharXiv – CS AI · 6d ago7/10
🧠

TraceSafe: A Systematic Assessment of LLM Guardrails on Multi-Step Tool-Calling Trajectories

Researchers introduce TraceSafe-Bench, a benchmark evaluating how well LLM guardrails detect safety risks across multi-step tool-using trajectories. The study reveals that guardrail effectiveness depends more on structural reasoning capabilities than semantic safety training, and that general-purpose LLMs outperform specialized safety models in detecting mid-execution vulnerabilities.

AIBullisharXiv – CS AI · 6d ago7/10
🧠

Computer Environments Elicit General Agentic Intelligence in LLMs

Researchers introduce LLM-in-Sandbox, a minimal computer environment that significantly enhances large language models' capabilities across diverse tasks without additional training. The approach enables weaker models to internalize agent-like behaviors through specialized training, demonstrating that environmental interaction—not just model parameters—drives general intelligence in LLMs.

AIBullisharXiv – CS AI · Apr 67/10
🧠

GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning

GrandCode, a new multi-agent reinforcement learning system, has become the first AI to consistently defeat all human competitors in live competitive programming contests, placing first in three recent Codeforces competitions. This breakthrough demonstrates AI has now surpassed even the strongest human programmers in the most challenging coding tasks.

🧠 Gemini
AINeutralarXiv – CS AI · Mar 277/10
🧠

ARC-AGI-3: A New Challenge for Frontier Agentic Intelligence

Researchers introduce ARC-AGI-3, a new benchmark for testing agentic AI systems that focuses on fluid adaptive intelligence without relying on language or external knowledge. While humans can solve 100% of the benchmark's abstract reasoning tasks, current frontier AI systems score below 1% as of March 2026.

AINeutralOpenAI News · Mar 257/10
🧠

Introducing the OpenAI Safety Bug Bounty program

OpenAI has launched a Safety Bug Bounty program designed to identify and address AI safety risks and potential abuse vectors. The program specifically targets vulnerabilities including agentic risks, prompt injection attacks, and data exfiltration threats.

🏢 OpenAI
AIBearisharXiv – CS AI · Mar 177/10
🧠

The Ghost in the Grammar: Methodological Anthropomorphism in AI Safety Evaluations

A philosophical analysis critiques AI safety research for excessive anthropomorphism, arguing researchers inappropriately project human qualities like "intention" and "feelings" onto AI systems. The study examines Anthropic's research on language models and proposes that the real risk lies not in emergent agency but in structural incoherence combined with anthropomorphic projections.

🏢 Anthropic
AINeutralarXiv – CS AI · Mar 177/10
🧠

Agentic AI, Retrieval-Augmented Generation, and the Institutional Turn: Legal Architectures and Financial Governance in the Age of Distributional AGI

This research paper examines how agentic AI systems that can act autonomously challenge existing legal and financial regulatory frameworks. The authors argue that AI governance must shift from model-level alignment to institutional governance structures that create compliant behavior through mechanism design and runtime constraints.

AI × CryptoBullishCoinDesk · Mar 167/10
🤖

AI-linked crypto tokens surge as Nvidia's Jensen Huang touts agentic future

Nvidia CEO Jensen Huang predicted $1 trillion in chip demand through 2027 while praising the development of agentic AI systems and OpenClaw. His bullish AI outlook has driven up AI-linked cryptocurrency tokens as investors anticipate increased demand for AI infrastructure.

AI-linked crypto tokens surge as Nvidia's Jensen Huang touts agentic future
🏢 Nvidia
AIBullisharXiv – CS AI · Mar 167/10
🧠

ARL-Tangram: Unleash the Resource Efficiency in Agentic Reinforcement Learning

Researchers introduced ARL-Tangram, a resource management system that optimizes cloud resource allocation for agentic reinforcement learning tasks involving large language models. The system achieves up to 4.3x faster action completion times and 71.2% resource savings through action-level orchestration, and has been deployed for training MiMo series models.

AIBullisharXiv – CS AI · Mar 117/10
🧠

Meissa: Multi-modal Medical Agentic Intelligence

Researchers have developed Meissa, a lightweight 4B-parameter medical AI model that brings advanced agentic capabilities offline for healthcare applications. The system matches frontier models like GPT in medical benchmarks while operating with 25x fewer parameters and 22x lower latency, addressing privacy and cost concerns in clinical settings.

🧠 Gemini
AIBullishAI News · Mar 107/10
🧠

Agentic AI in finance speeds up operational automation

Financial infrastructure provider SEI has partnered with IBM to modernize internal operations through agentic AI and automation. The initiative focuses on process redesign and system updates to create data-enabled foundations for consistent client experiences in financial services.

AINeutralarXiv – CS AI · Mar 97/10
🧠

From Features to Actions: Explainability in Traditional and Agentic AI Systems

Researchers demonstrate that traditional explainable AI methods designed for static predictions fail when applied to agentic AI systems that make sequential decisions over time. The study shows attribution-based explanations work well for static tasks but trace-based diagnostics are needed to understand failures in multi-step AI agent behaviors.

AINeutralarXiv – CS AI · Mar 57/10
🧠

The Controllability Trap: A Governance Framework for Military AI Agents

Researchers propose the Agentic Military AI Governance Framework (AMAGF) to address control failures in autonomous military AI systems. The framework introduces a Control Quality Score (CQS) to continuously measure and manage human control over AI agents throughout operations, moving beyond binary control models.

AIBullisharXiv – CS AI · Mar 57/10
🧠

Robustness of Agentic AI Systems via Adversarially-Aligned Jacobian Regularization

Researchers introduce Adversarially-Aligned Jacobian Regularization (AAJR), a new method to improve the robustness of autonomous AI agent systems by controlling sensitivity along adversarial directions rather than globally. This approach maintains better performance while ensuring stability in multi-agent AI ecosystems compared to existing methods.

AIBullisharXiv – CS AI · Mar 57/10
🧠

Agentics 2.0: Logical Transduction Algebra for Agentic Data Workflows

Researchers have introduced Agentics 2.0, a Python framework for building enterprise-grade AI agent workflows using logical transduction algebra. The framework addresses reliability, scalability, and observability challenges in deploying agentic AI systems beyond research prototypes.

AINeutralarXiv – CS AI · Mar 56/10
🧠

From Privacy to Trust in the Agentic Era: A Taxonomy of Challenges in Trustworthy Federated Learning Through the Lens of Trust Report 2.0

Researchers propose Trustworthy Federated Learning (TFL) framework that treats trust as a continuously maintained system condition rather than static property, addressing challenges in AI systems with autonomous decision-making. The framework introduces Trust Report 2.0 as a privacy-preserving coordination blueprint for multi-stakeholder governance in federated learning deployments.

AIBullisharXiv – CS AI · Mar 57/10
🧠

An LLM Agentic Approach for Legal-Critical Software: A Case Study for Tax Prep Software

Researchers developed a multi-agent LLM system that translates legal statutes into executable software, using U.S. tax preparation as a test case. The system achieved a 45% success rate using GPT-4o-mini, significantly outperforming larger frontier models like GPT-4o and Claude 3.5 which only achieved 9-15% success rates on complex tax code tasks.

🧠 GPT-4🧠 Claude
AIBullishMIT Technology Review · Mar 46/103
🧠

Bridging the operational AI gap

Enterprises are moving beyond AI pilot projects to full production deployment, with companies actively redirecting budgets and resources toward AI implementation. Organizations are beginning to experiment with agentic AI systems that promise enhanced automation capabilities.

AIBullisharXiv – CS AI · Mar 37/103
🧠

GEM: A Gym for Agentic LLMs

Researchers introduced GEM (General Experience Maker), an open-source environment simulator designed for training large language models through experience-based learning rather than static datasets. The framework provides a standardized interface similar to OpenAI-Gym but specifically optimized for LLMs, featuring diverse environments, integrated tools, and compatibility with popular RL training frameworks.

$MKR
AI × CryptoNeutralBankless · Feb 277/106
🤖

Jack Dorsey's Block Slashes 40% of Workforce, Credits AI Efficiency

Jack Dorsey's Block has laid off 40% of its workforce, citing AI efficiency as the reason for the massive reduction. The cryptocurrency XYZ surged 20% overnight as investors anticipate improved productivity through the company's agentic AI implementation.

AIBullisharXiv – CS AI · Feb 277/109
🧠

ArchAgent: Agentic AI-driven Computer Architecture Discovery

ArchAgent, an AI-driven system built on AlphaEvolve, has achieved breakthrough results in automated computer architecture discovery by designing state-of-the-art cache replacement policies. The system achieved 5.3% performance improvements in just 2 days and 0.9% improvements in 18 days, working 3-5x faster than human-developed solutions.

Page 1 of 3Next →