y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#agent-frameworks News & Analysis

7 articles tagged with #agent-frameworks. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

7 articles
AINeutralarXiv โ€“ CS AI ยท Apr 147/10
๐Ÿง 

AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts

Researchers introduce AgencyBench, a comprehensive benchmark for evaluating autonomous AI agents across 32 real-world scenarios requiring up to 1 million tokens and 90 tool calls. The evaluation reveals closed-source models like Claude significantly outperform open-source alternatives (48.4% vs 32.1%), with notable performance variations based on execution frameworks and model optimization.

๐Ÿง  Claude
AIBullisharXiv โ€“ CS AI ยท Apr 77/10
๐Ÿง 

SkillX: Automatically Constructing Skill Knowledge Bases for Agents

Researchers introduce SkillX, an automated framework for building reusable skill knowledge bases for AI agents that addresses inefficiencies in current self-evolving paradigms. The system uses multi-level skill design, iterative refinement, and exploratory expansion to create plug-and-play skill libraries that improve task success and execution efficiency across different agents and environments.

AIBearisharXiv โ€“ CS AI ยท Apr 67/10
๐Ÿง 

A Systematic Security Evaluation of OpenClaw and Its Variants

A comprehensive security evaluation of six OpenClaw-series AI agent frameworks reveals substantial vulnerabilities across all tested systems, with agentized systems proving significantly riskier than their underlying models. The study identified reconnaissance and discovery behaviors as the most common weaknesses, while highlighting that security risks are amplified through multi-step planning and runtime orchestration capabilities.

AIBullisharXiv โ€“ CS AI ยท Mar 117/10
๐Ÿง 

MASEval: Extending Multi-Agent Evaluation from Models to Systems

MASEval introduces a new framework-agnostic evaluation library for multi-agent AI systems that treats entire systems rather than just models as the unit of analysis. Research across 3 benchmarks, models, and frameworks reveals that framework choice impacts performance as much as model selection, challenging current model-centric evaluation approaches.

AINeutralarXiv โ€“ CS AI ยท Mar 36/107
๐Ÿง 

Agents Learn Their Runtime: Interpreter Persistence as Training-Time Semantics

Researchers found that AI agents perform better when their training data matches their deployment environment, specifically regarding interpreter state persistence. Models trained with persistent state but deployed in stateless environments trigger errors in 80% of cases, while the reverse wastes 3.5x more tokens through redundant computations.

AIBullishHugging Face Blog ยท Feb 126/106
๐Ÿง 

OpenEnv in Practice: Evaluating Tool-Using Agents in Real-World Environments

The article discusses OpenEnv, a framework for evaluating AI agents that use tools in real-world environments. This research focuses on testing how well AI agents can interact with and utilize various tools when deployed in practical, real-world scenarios rather than controlled laboratory settings.