🧠 AI⚪ NeutralImportance 6/10

Logit Distance Bounds Representational Similarity

arXiv – CS AI|Beatrix M. G. Nielsen, Emanuele Marconato, Luigi Gresele, Andrea Dittadi, Simon Buchholz|June 25, 2026 at 04:00 AM

🤖AI Summary

Researchers demonstrate that logit distance—a measure based on differences in model predictions—better bounds representational similarity in neural networks than KL divergence does. The findings reveal that KL-based distillation can preserve predictive accuracy while failing to maintain the linear structure of internal representations, with implications for transfer learning and model compression.

Analysis

This research addresses a fundamental challenge in machine learning: understanding when two neural networks learn similar internal representations despite producing identical outputs. The work builds on recent observations that distributional closeness in KL divergence does not guarantee linear representational similarity, a gap with significant practical consequences.

The theoretical contribution centers on logit distance—a metric comparing raw model predictions before probability normalization. By proving that representational dissimilarity is bounded by logit distance, the authors establish a tighter connection between output similarity and internal structure than KL divergence provides. This matters because previous work showed KL-based model distillation, a widespread technique for compressing large models into smaller ones, can preserve prediction accuracy while destroying linear interpretability properties that encode human-understandable concepts.

The practical implications extend across machine learning applications. When practitioners distill teacher models into student models using KL divergence, they optimize for matching probability distributions but inadvertently sacrifice the linear structure needed for concept recovery and model interpretability. The experiments demonstrate that logit-distance distillation produces students with substantially higher linear representational similarity and better preservation of linearly recoverable concepts.

For the AI community, this challenges existing distillation practices and suggests logit distance should become standard in compression pipelines prioritizing interpretability. Organizations deploying distilled models for sensitive applications—medical diagnosis, financial forecasting, or policy-relevant decisions—should evaluate whether KL-based compression compromises the conceptual alignment necessary for explainability and trustworthiness.

Key Takeaways

→Logit distance provides tighter bounds on representational similarity than KL divergence in neural networks
→KL-based model distillation preserves prediction accuracy but destroys linear representational structure
→Logit-distance distillation recovers interpretable concepts better than standard KL-divergence methods
→The findings apply to broad model families including autoregressive language models
→Results highlight a critical trade-off between output fidelity and representation preservation in compression