What Cohort INRs Encode and Where to Freeze Them
Researchers demonstrate that early layers of cohort-trained Implicit Neural Representations (INRs) encode transferable features for signal fitting, identifying optimal freezing points through weight stable rank analysis. Using sparse autoencoders for mechanistic interpretability, they reveal that SIREN and Fourier-feature MLPs learn fundamentally different dictionary representations despite comparable performance, with implications for designing more generalizable neural architectures.
This research advances understanding of transfer learning in implicit neural representations by bridging the gap between which layers transfer and what those layers actually encode. The findings emerge from systematic analysis of two popular INR architectures, revealing that optimal layer freezing coincides with maximum weight stable rank—a connection that suggests fundamental principles governing when to preserve learned features versus fine-tuning them. The application of sparse autoencoders to INR activations represents a methodological breakthrough in mechanistic interpretability, offering the first interpretable decomposition of how these models encode information.
The contrasting dictionary structures between SIREN and FFMLP are particularly revealing. SIREN learns spatially localized atoms that tile coordinate space with position-specific firing patterns, while FFMLP develops image-spanning atoms that trace cohort signal contours. This qualitative difference explains how architecturally similar systems can achieve comparable fitting performance through fundamentally distinct representational strategies. The single-atom ablation experiments provide causal evidence for these interpretations, demonstrating that FFMLP's global atoms have broad impact while SIREN's localized atoms affect only their spatial regions.
For the broader AI community, these results shift focus from pure performance metrics toward understanding what neural networks actually learn and memorize. This mechanistic lens directly addresses contemporary concerns about generalization versus memorization in neural architectures. The research suggests that future INR and neural architecture design could intentionally steer toward generalizable representations rather than memorization, potentially improving transfer learning efficiency across domains. The tools developed here—stable rank analysis and SAE-based interpretation—offer transferable methodologies for analyzing other neural systems beyond INRs.
- →Weight stable rank in shared encoder layers predicts optimal freeze depth, matching or exceeding standard fine-tuning across experiments
- →SIREN learns spatially localized dictionary atoms while FFMLP learns image-spanning atoms, despite achieving comparable cohort-fitting performance
- →Single FFMLP atoms can cause 10.6 dB PSNR drops across images, proving causal importance of learned dictionary structures
- →Sparse autoencoders provide mechanistic interpretability of INR activations, enabling inspection of learned representations
- →Architecture-specific representational strategies suggest future INR design should optimize for generalization over memorization