7 articles tagged with #agent-frameworks. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AINeutralarXiv โ CS AI ยท Apr 147/10
๐ง Researchers introduce AgencyBench, a comprehensive benchmark for evaluating autonomous AI agents across 32 real-world scenarios requiring up to 1 million tokens and 90 tool calls. The evaluation reveals closed-source models like Claude significantly outperform open-source alternatives (48.4% vs 32.1%), with notable performance variations based on execution frameworks and model optimization.
๐ง Claude
AIBullisharXiv โ CS AI ยท Apr 77/10
๐ง Researchers introduce SkillX, an automated framework for building reusable skill knowledge bases for AI agents that addresses inefficiencies in current self-evolving paradigms. The system uses multi-level skill design, iterative refinement, and exploratory expansion to create plug-and-play skill libraries that improve task success and execution efficiency across different agents and environments.
AIBearisharXiv โ CS AI ยท Apr 67/10
๐ง A comprehensive security evaluation of six OpenClaw-series AI agent frameworks reveals substantial vulnerabilities across all tested systems, with agentized systems proving significantly riskier than their underlying models. The study identified reconnaissance and discovery behaviors as the most common weaknesses, while highlighting that security risks are amplified through multi-step planning and runtime orchestration capabilities.
AIBullisharXiv โ CS AI ยท Mar 117/10
๐ง MASEval introduces a new framework-agnostic evaluation library for multi-agent AI systems that treats entire systems rather than just models as the unit of analysis. Research across 3 benchmarks, models, and frameworks reveals that framework choice impacts performance as much as model selection, challenging current model-centric evaluation approaches.
AIBullisharXiv โ CS AI ยท Mar 176/10
๐ง Researchers introduce CLAG, a clustering-based memory framework that helps small language model agents organize and retrieve information more effectively. The system addresses memory dilution issues by creating semantic clusters with automated profiles, showing improved performance across multiple QA datasets.
AINeutralarXiv โ CS AI ยท Mar 36/107
๐ง Researchers found that AI agents perform better when their training data matches their deployment environment, specifically regarding interpreter state persistence. Models trained with persistent state but deployed in stateless environments trigger errors in 80% of cases, while the reverse wastes 3.5x more tokens through redundant computations.
AIBullishHugging Face Blog ยท Feb 126/106
๐ง The article discusses OpenEnv, a framework for evaluating AI agents that use tools in real-world environments. This research focuses on testing how well AI agents can interact with and utilize various tools when deployed in practical, real-world scenarios rather than controlled laboratory settings.