#agent-frameworks News & Analysis

13 articles tagged with #agent-frameworks. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

13 articles

AIBullisharXiv – CS AI · Jun 27/10

🧠

Model-Native Computing Architecture: Envisioning Future System Architecture Through the Lens of Computer Architecture

Researchers propose the Intelligent Computing Architecture Model (ICAM), a six-layer framework that applies classical computer architecture principles to large language models and agentic AI systems. The paper maps recurring engineering challenges—cache reuse, context management, agent scheduling, and permission control—to traditional systems problems, introducing three design laws to optimize model-native computing efficiency and coordination.

🧠 Claude

AIBearisharXiv – CS AI · May 277/10

🧠

Lessons from Penetration Tests on Large-Scale Agent Systems

A new research paper presents findings from penetration tests conducted in 2025 against proprietary AI agent systems, examining whether security vulnerabilities in autonomous agents have improved compared to open-source alternatives. The study reveals that execution-capable AI agents face recurring security weaknesses similar to those in traditional software systems, challenging assumptions that proprietary development with stricter standards provides meaningfully better security outcomes.

AIBullisharXiv – CS AI · May 77/10

🧠

TSCG: Deterministic Tool-Schema Compilation for Agentic LLM Deployments

TSCG is a deterministic compiler that converts JSON tool schemas into structured text optimized for language model interpretation, solving a critical failure point in agentic AI systems. The technology restores accuracy in smaller models (4B-14B) from near-zero to 84%+ on production-scale tool catalogs while reducing token consumption by 52-57%, shipping as a lightweight TypeScript package.

🏢 OpenAI🏢 Anthropic🧠 GPT-5

AIBearisharXiv – CS AI · May 17/10

🧠

Security Attack and Defense Strategies for Autonomous Agent Frameworks: A Layered Review with OpenClaw as a Case Study

A comprehensive academic survey examines security vulnerabilities and defense mechanisms across four operational layers of autonomous agent frameworks built on large language models. The research identifies how threats propagate across layers—from input manipulation through unsafe actions to ecosystem-level impacts—highlighting critical gaps in current security approaches as these systems become increasingly complex and integrated.

AINeutralarXiv – CS AI · Apr 147/10

🧠

AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts

Researchers introduce AgencyBench, a comprehensive benchmark for evaluating autonomous AI agents across 32 real-world scenarios requiring up to 1 million tokens and 90 tool calls. The evaluation reveals closed-source models like Claude significantly outperform open-source alternatives (48.4% vs 32.1%), with notable performance variations based on execution frameworks and model optimization.

🧠 Claude

AIBullisharXiv – CS AI · Apr 77/10

🧠

SkillX: Automatically Constructing Skill Knowledge Bases for Agents

Researchers introduce SkillX, an automated framework for building reusable skill knowledge bases for AI agents that addresses inefficiencies in current self-evolving paradigms. The system uses multi-level skill design, iterative refinement, and exploratory expansion to create plug-and-play skill libraries that improve task success and execution efficiency across different agents and environments.

AIBearisharXiv – CS AI · Apr 67/10

🧠

A Systematic Security Evaluation of OpenClaw and Its Variants

A comprehensive security evaluation of six OpenClaw-series AI agent frameworks reveals substantial vulnerabilities across all tested systems, with agentized systems proving significantly riskier than their underlying models. The study identified reconnaissance and discovery behaviors as the most common weaknesses, while highlighting that security risks are amplified through multi-step planning and runtime orchestration capabilities.

AIBullisharXiv – CS AI · Mar 117/10

🧠

MASEval: Extending Multi-Agent Evaluation from Models to Systems

MASEval introduces a new framework-agnostic evaluation library for multi-agent AI systems that treats entire systems rather than just models as the unit of analysis. Research across 3 benchmarks, models, and frameworks reveals that framework choice impacts performance as much as model selection, challenging current model-centric evaluation approaches.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Libretto: Giving LLM Agents a Sense of Musical Structure

Researchers introduce Libretto, an LLM-native framework that enables AI agents to generate and edit symbolic music with explicit structural control over rhythm, harmony, melody, and form. The system transforms music generation from opaque audio outputs into inspectable, measurable objects that support iterative refinement and educational applications.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Agent Skill Framework: Perspectives on the Potential of Small to Medium Language Models in Industrial Environments

Researchers systematically evaluated how small-to-medium open-source language models (270M-80B parameters) perform with agent skill frameworks in resource-constrained industrial settings. The study reveals that models under 30B struggle with reliable skill selection, while 30B-80B models show substantial improvements, though thinking variants offer diminishing returns relative to GPU costs.

AIBullisharXiv – CS AI · Mar 176/10

🧠

CLAG: Adaptive Memory Organization via Agent-Driven Clustering for Small Language Model Agents

Researchers introduce CLAG, a clustering-based memory framework that helps small language model agents organize and retrieve information more effectively. The system addresses memory dilution issues by creating semantic clusters with automated profiles, showing improved performance across multiple QA datasets.

AINeutralarXiv – CS AI · Mar 36/107

🧠

Agents Learn Their Runtime: Interpreter Persistence as Training-Time Semantics

Researchers found that AI agents perform better when their training data matches their deployment environment, specifically regarding interpreter state persistence. Models trained with persistent state but deployed in stateless environments trigger errors in 80% of cases, while the reverse wastes 3.5x more tokens through redundant computations.

AIBullishHugging Face Blog · Feb 126/106

🧠

OpenEnv in Practice: Evaluating Tool-Using Agents in Real-World Environments

The article discusses OpenEnv, a framework for evaluating AI agents that use tools in real-world environments. This research focuses on testing how well AI agents can interact with and utilize various tools when deployed in practical, real-world scenarios rather than controlled laboratory settings.