y0news
โ† Feed
โ†Back to feed
๐Ÿง  AIโšช NeutralImportance 7/10

Manifold of Failure: Behavioral Attraction Basins in Language Models

arXiv โ€“ CS AI|Sarthak Munshi, Manish Bhatt, Vineeth Sai Narajala, Idan Habler, AmmarnAl-Kahfah, Ken Huang, Blake Gatto||3 views
๐Ÿค–AI Summary

Researchers developed a new framework called MAP-Elites to systematically map vulnerability regions in Large Language Models, revealing distinct safety landscape patterns across different models. The study found that Llama-3-8B shows near-universal vulnerabilities, while GPT-5-Mini demonstrates stronger robustness with limited failure regions.

Key Takeaways
  • โ†’MAP-Elites framework achieves up to 63% behavioral coverage and discovers up to 370 distinct vulnerability niches in LLMs.
  • โ†’Llama-3-8B exhibits the highest vulnerability with mean Alignment Deviation of 0.93 across a near-universal plateau.
  • โ†’GPT-OSS-20B shows fragmented vulnerability landscape with spatially concentrated basins at 0.73 mean deviation.
  • โ†’GPT-5-Mini demonstrates strongest robustness with vulnerability ceiling capped at 0.50 alignment deviation.
  • โ†’The approach shifts AI safety paradigm from finding discrete failures to understanding underlying structural vulnerabilities.
Mentioned Tokens
$NEAR$0.0000โ–ฒ+0.0%
Let AI manage these โ†’
Non-custodial ยท Your keys, always
Read Original โ†’via arXiv โ€“ CS AI
Act on this with AI
This article mentions $NEAR.
Let your AI agent check your portfolio, get quotes, and propose trades โ€” you review and approve from your device.
Connect Wallet to AI โ†’How it works
Related Articles