y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#safety-utility-tradeoff News & Analysis

1 article tagged with #safety-utility-tradeoff. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles
AIBullisharXiv – CS AI · 6h ago7/10
🧠

SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety

SafeHarbor is a new framework that enhances Large Language Model agent safety by using hierarchical memory and context-aware defense rules to prevent harmful tool use while maintaining utility on benign tasks. The system achieves 93%+ refusal rates against malicious requests while preserving 63.6% performance on legitimate tasks, addressing a critical trade-off in AI safety.

🧠 GPT-4