y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#safety-diagnostics News & Analysis

1 article tagged with #safety-diagnostics. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles
AINeutralarXiv – CS AI · 8h ago7/10
🧠

Skin-Deep: A Geometric Diagnostic for Alignment Fragility in Large Language Model Representations

Researchers introduce Skin-Deep, a geometric diagnostic tool that detects fragility in AI safety alignment before attacks occur by analyzing hidden-state activations and producing a single Geometric Fragility Score. Testing across 21 instruction-tuned models reveals a recurring low-rank safety subspace, enabling pre-deployment identification of models vulnerable to refusal degradation through fine-tuning.