y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#ai-agents News & Analysis

449 articles tagged with #ai-agents. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

449 articles
AIBullishHugging Face Blog · Feb 186/106
🧠

IBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST

IBM and UC Berkeley collaborated to develop IT-Bench and MAST diagnostic tools to identify and analyze failure points in enterprise AI agent deployments. The research addresses critical gaps in understanding why AI agents underperform in real-world business environments compared to controlled testing scenarios.

AIBullishHugging Face Blog · Feb 126/106
🧠

OpenEnv in Practice: Evaluating Tool-Using Agents in Real-World Environments

The article discusses OpenEnv, a framework for evaluating AI agents that use tools in real-world environments. This research focuses on testing how well AI agents can interact with and utilize various tools when deployed in practical, real-world scenarios rather than controlled laboratory settings.

AIBullishMIT News – AI · Feb 56/105
🧠

Helping AI agents search to get the best results out of large language models

EnCompass is a new system that helps AI agents work more efficiently by using backtracking and multiple attempts to find the best outputs from large language models. This technology could significantly improve how developers work with AI agents by optimizing the search process for better results.

AINeutralOpenAI News · Jan 286/105
🧠

Keeping your data safe when an AI agent clicks a link

OpenAI has implemented safeguards to protect user data when AI agents interact with external links, addressing potential security vulnerabilities. The measures focus on preventing URL-based data exfiltration and prompt injection attacks that could compromise user information.

$LINK
AINeutralOpenAI News · Jan 235/104
🧠

Unrolling the Codex agent loop

This article provides a technical deep dive into the Codex agent loop architecture, detailing how the Codex CLI system orchestrates AI models, tools, prompts, and performance monitoring through the Responses API. The analysis focuses on the technical implementation and workflow of the Codex agent system.

AIBullishMicrosoft Research Blog · Jan 206/101
🧠

Multimodal reinforcement learning with agentic verifier for AI agents

Microsoft Research introduces Argos, a multimodal reinforcement learning approach that uses an agentic verifier to evaluate whether AI agents' reasoning aligns with their observations over time. The system reduces visual hallucinations and creates more reliable, data-efficient agents for real-world applications.

Multimodal reinforcement learning with agentic verifier for AI agents
AINeutralVentureBeat – AI · Jan 196/104
🧠

Claude Code costs up to $200 a month. Goose does the same thing for free.

Block has released Goose, a free open-source AI coding agent that provides similar functionality to Anthropic's Claude Code, which costs $20-200 per month. Goose runs locally on users' machines without subscription fees or usage limits, addressing developer frustrations with Claude Code's pricing and rate restrictions.

Claude Code costs up to $200 a month. Goose does the same thing for free.
$NEAR
AIBullishOpenAI News · Jan 86/102
🧠

Netomi’s lessons for scaling agentic systems into the enterprise

Netomi demonstrates how to scale enterprise AI agents using GPT-4.1 and GPT-5.2 by implementing concurrency, governance frameworks, and multi-step reasoning capabilities. The approach focuses on creating reliable production workflows that can handle enterprise-scale AI agent deployments.

AIBullishHugging Face Blog · Jan 56/105
🧠

NVIDIA brings agents to life with DGX Spark and Reachy Mini

NVIDIA announced DGX Spark and Reachy Mini, new hardware solutions designed to bring AI agents to life with enhanced physical interaction capabilities. These products represent NVIDIA's expansion into embodied AI and robotics applications.

AINeutralIEEE Spectrum – AI · Dec 316/105
🧠

The Top 6 AI Stories of 2025

IEEE Spectrum's analysis of 2025's top AI stories reveals a year of maturation rather than hype, with generative AI moving from novelty to routine use while facing growing scrutiny over environmental costs, reliability issues, and practical limitations. The coverage highlights both breakthrough applications in areas like weather forecasting and coding assistance, as well as persistent challenges including water consumption, different failure modes compared to human errors, and the proliferation of AI-generated content.

AIBullishMicrosoft Research Blog · Dec 116/103
🧠

Agent Lightning: Adding reinforcement learning to AI agents without code rewrites

Microsoft Research introduced Agent Lightning, a system that enables developers to add reinforcement learning capabilities to AI agents without requiring code rewrites. The system decouples agent functionality from training processes, converting each agent action into reinforcement learning data to improve performance with minimal code changes.

AIBullishOpenAI News · Dec 15/106
🧠

Inside Mirakl's agentic commerce vision

Mirakl is leveraging AI agents and ChatGPT Enterprise to transform commerce operations, focusing on improved documentation processes and enhanced customer support capabilities. The company is developing Mirakl Nexus as part of its broader vision to create agent-native commerce experiences.

AIBullishOpenAI News · Oct 66/106
🧠

Introducing AgentKit, new Evals, and RFT for agents

OpenAI has released new developer tools including AgentKit, expanded evaluation capabilities, and reinforcement fine-tuning specifically designed for AI agents. These tools aim to accelerate the development process from prototype to production deployment for AI agent applications.

AIBullishHugging Face Blog · Sep 236/106
🧠

Smol2Operator: Post-Training GUI Agents for Computer Use

Smol2Operator introduces post-training GUI agents designed for computer use applications. The development represents advancement in AI agents capable of interacting with graphical user interfaces autonomously.

AIBullishOpenAI News · Aug 125/106
🧠

Scaling accounting capacity with OpenAI

Basis has developed AI agents using OpenAI's latest models (o3, o3-Pro, GPT-4.1, and GPT-5) to help accounting firms automate tasks and save up to 30% of their time. The technology enables accounting firms to expand their capacity for advisory services and business growth by reducing manual work.

AIBullishGoogle Research Blog · Aug 16/107
🧠

MLE-STAR: A state-of-the-art machine learning engineering agent

MLE-STAR represents a new state-of-the-art machine learning engineering agent that advances automated ML capabilities. The development showcases continued progress in AI automation tools for machine learning workflows.

AIBullishOpenAI News · Jun 265/106
🧠

Customizable, no-code voice agent automation with GPT-4o

Retell AI has launched a no-code platform for AI voice automation powered by GPT-4o and GPT-4.1, enabling businesses to deploy natural voice agents for call centers. The platform aims to reduce call costs, improve customer satisfaction, and automate conversations without requiring scripts or causing hold times.

AIBullishHugging Face Blog · Jun 36/107
🧠

Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H

Holo1 represents a new family of Vision-Language Models (VLMs) specifically designed for GUI automation, powering the GUI agent Surfer-H. This development advances AI's ability to interact with graphical user interfaces autonomously.

AIBullishOpenAI News · May 216/107
🧠

New tools and features in the Responses API

The Responses API has introduced new capabilities including Remote MCP, image generation, and Code Interpreter functionality. These updates are designed to enhance AI agent performance using GPT-4o and o-series models while improving reliability and efficiency.

AIBullishOpenAI News · May 166/105
🧠

Introducing Codex

Codex is a new cloud-based software engineering agent powered by codex-1 that enables developers to deploy multiple AI agents simultaneously for parallel coding tasks. The platform can handle various development activities including writing features, answering codebase questions, fixing bugs, and creating pull requests for review.

AINeutralOpenAI News · Apr 26/107
🧠

PaperBench: Evaluating AI’s Ability to Replicate AI Research

PaperBench is a new benchmark designed to evaluate AI agents' ability to replicate state-of-the-art AI research. This tool aims to measure how effectively AI systems can reproduce complex research methodologies and findings.

AIBullishOpenAI News · Mar 276/108
🧠

Moving from intent-based bots to proactive AI agents

The article discusses the evolution from intent-based bots to proactive AI agents, representing a shift towards more autonomous and anticipatory artificial intelligence systems. This transition suggests AI systems are moving beyond reactive responses to user commands toward predictive and self-initiated actions.

AIBullishOpenAI News · Mar 115/107
🧠

New tools for building agents

A platform is introducing new tools designed to help developers and enterprises build more useful and reliable AI agents. The announcement indicates an evolution of their existing platform capabilities focused on agent development infrastructure.

AIBullishOpenAI News · Feb 26/105
🧠

Introducing deep research

A new AI research agent has been launched that can synthesize large amounts of online information and complete complex multi-step research tasks through advanced reasoning capabilities. The tool is currently available to Pro users with rollout planned for Plus and Team subscribers.

← PrevPage 16 of 18Next →