AINeutralarXiv โ CS AI ยท Feb 277/105
๐ง
A Decision-Theoretic Formalisation of Steganography With Applications to LLM Monitoring
Researchers have developed a new decision-theoretic framework to detect steganographic capabilities in large language models, which could help identify when AI systems are hiding information to evade oversight. The method introduces 'generalized V-information' and a 'steganographic gap' measure to quantify hidden communication without requiring reference distributions.