#logit-lens News & Analysis

2 articles tagged with #logit-lens. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles

AIBullisharXiv – CS AI · Jun 27/10

🧠

TriLens: Per-Layer Logit-Lens Entropy for White-Box Hallucination Detection

TriLens is a novel white-box detection method that identifies hallucinations in language models by tracking entropy changes across internal computational layers. Rather than examining only final outputs, the technique monitors uncertainty signals from multi-head attention, feed-forward networks, and residual streams using logit lens analysis, creating a compact 3L-dimensional trajectory that reveals how model confidence settles during inference.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Query Lens: Interpreting Sparse Key-Value Features with Indirect Effects

Query Lens extends the Logit Lens technique to improve the interpretability of sparse autoencoders by analyzing both encoder key features and decoder value features, while accounting for indirect downstream effects. The research introduces the Subspace Channel Hypothesis, suggesting that neural modules process features through layer-specific subspaces, advancing understanding of how AI models process and manipulate information.