#representational-learning News & Analysis

2 articles tagged with #representational-learning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles

AINeutralarXiv – CS AI · Jun 117/10

🧠

When Roleplaying, Do Models Believe What They Say?

Researchers discover that when language models roleplay historical figures with different belief systems, they primarily change their outputs rather than their internal representations of truth. The study contrasts this with Emergent Misalignment, where models trained on harmful content actually internalize false beliefs, suggesting different degrees of belief internalization exist across model behaviors.

🧠 Llama

AINeutralarXiv – CS AI · Jun 26/10

🧠

Detection vs. Execution: Single-Bucket Probes Miss Half the Mamba-2 State Sink

Researchers demonstrate that single-bucket probes in Mamba-2 language models identify representational signatures but fail to capture complete computational circuits, missing up to half the execution layer. The study reveals that probe-based mechanistic interpretability can conflate detection mechanisms with execution mechanisms, with critical implications for model behavior—ablating identified head groups entirely collapses retrieval accuracy in downstream tasks.