STARS: Skill-Triggered Audit for Request-Conditioned Invocation Safety in Agent Systems
Researchers introduce STARS, a framework for continuously auditing AI agent skill invocations in real-time by combining static capability analysis with request-conditioned risk modeling. The approach demonstrates improved detection of prompt injection attacks compared to static baselines, though remains most valuable as a triage layer rather than a complete replacement for pre-deployment screening.
The rapid proliferation of autonomous AI agents equipped with external tools and skills has created a critical security challenge: determining whether a particular tool invocation is safe within its specific operational context. Traditional static auditing examines capability surfaces before deployment but cannot account for dynamic risk factors emerging at runtime. STARS addresses this gap by formulating skill invocation safety as a continuous risk-estimation problem, enabling ranking and prioritization of potentially problematic actions before execution occurs.
The framework combines three components: a static capability prior establishing baseline risk profiles, a request-conditioned invocation risk model that evaluates specific user requests against tool behavior, and a calibrated risk-fusion policy that synthesizes these signals. The researchers constructed SIA-Bench, a benchmark dataset of 3,000 labeled invocation records including indirect prompt injection scenarios, to evaluate their approach against existing methods.
Results show meaningful but modest improvements in detecting high-risk invocations, with calibrated fusion achieving 0.439 AUPRC on indirect prompt injection attacks versus 0.405 for contextual-only and 0.380 for static-only baselines. However, performance gains narrow on standard in-distribution test sets, indicating that static priors retain substantial value for routine scenarios. This suggests request-conditioned auditing functions optimally as an intermediate risk-scoring mechanism within a layered defense strategy rather than as a complete safety replacement.
For the AI agent ecosystem, this work establishes that runtime context matters significantly for safety decisions, but also validates that no single approach eliminates the need for comprehensive multi-stage screening. Organizations deploying autonomous agents should implement complementary static and dynamic auditing rather than relying exclusively on either method.
- →STARS combines static analysis with request-conditioned risk modeling to audit AI agent skill invocations at runtime
- →Contextual auditing improves prompt injection detection but gains diminish on standard test cases, validating multi-stage screening approaches
- →SIA-Bench benchmark provides 3,000 labeled invocation records for evaluating agent safety systems
- →Dynamic risk-scoring serves best as a triage and prioritization layer alongside static pre-deployment screening
- →Calibration of risk fusion policies proves critical for practical deployment in safety-sensitive applications