#llm-monitoring News & Analysis

2 articles tagged with #llm-monitoring. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles

AINeutralarXiv – CS AI · Feb 277/105

🧠

A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring

Researchers have developed a new decision-theoretic framework to detect steganographic capabilities in large language models, which could help identify when AI systems are hiding information to evade oversight. The method introduces 'generalized V-information' and a 'steganographic gap' measure to quantify hidden communication without requiring reference distributions.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Trait-space Monitoring for Emergent Misalignment During Supervised Finetuning

Researchers have developed a method to detect emergent misalignment in large language models during finetuning by monitoring internal representational shifts rather than relying solely on behavioral evaluation. The technique identifies dangerous model behavior through a low-dimensional geometric signature in activation space, achieving high detection accuracy with minimal computational overhead.