AINeutralarXiv – CS AI · 10h ago6/10
🧠
Cross-Family Universality of Behavioral Axes via Anchor-Projected Representations
Researchers introduce an anchor-projection framework that enables behavioral directions to transfer across different large language model families by mapping their diverse hidden representations into a shared coordinate space. The approach achieves high cross-model alignment (0.83 ten-way detection accuracy) without fine-tuning, demonstrating that interpretability and control mechanisms can be standardized across architecturally different models.
🧠 Llama