AINeutralarXiv – CS AI · 6h ago6/10
🧠
READER: Robust Evidence-based Authorship Decoding via Extracted Representations
Researchers introduce READER, a framework for identifying which large language model generated a specific output by analyzing hidden activation patterns. The method achieves 70-84% accuracy in identifying source models from 50 diverse prompts, suggesting that model-specific authorship signals exist in frozen LLM representations and can be reliably extracted.