Dissociating spatial frequency reliance from adversarial robustness advantages in neurally guided deep convolutional neural networks
Researchers challenge the assumption that neural alignment improves adversarial robustness in deep learning models by reducing reliance on high-frequency image details. Their experiments reveal that spatial-frequency bias is likely a byproduct rather than the primary mechanism, suggesting robustness improvements stem from learning human-like visual representations through more complex means.
This research addresses a fundamental question in adversarial machine learning: why do deep neural networks aligned with human visual cortex representations demonstrate greater robustness to adversarial attacks? The prevailing hypothesis attributed this advantage to spectral bias—models learning to rely on low spatial frequencies (LSF) similar to human vision. However, this study systematically decouples spatial-frequency preference from neural alignment effects, revealing a surprising dissociation.
The work builds on established findings that human object recognition depends critically on mid-frequency bands. Prior neuroscience research indicated these frequencies were partially preserved in LSF-focused studies, but the relative importance remained unclear. By directly manipulating models to favor specific frequency bands, the researchers could isolate causality rather than mere correlation.
Their findings demonstrate that while neurally aligned models do shift toward LSF and human-channel reliance, directly inducing these frequency biases independently fails to replicate robustness gains. Spatial-frequency steering produces only modest improvements despite larger representational shifts than naturally achieved through neural alignment. This suggests the robustness advantage emerges from acquiring human-like geometric structure in learned representations rather than simple frequency filtering.
These results have implications for adversarial robustness research, suggesting current approaches may pursue superficial explanations. The study redirects attention toward deeper representational properties—potentially involving hierarchical organization, feature disentanglement, or invariance structures that human vision has evolved. Future adversarial training methods might benefit from directly optimizing these geometric properties rather than targeting frequency distributions as a proxy.
- →Neural alignment improves adversarial robustness through human-like representational geometry, not spatial-frequency bias alone
- →Direct manipulation of frequency preferences fails to replicate robustness gains achieved through neural-guided training
- →Spatial-frequency reliance appears to be an emergent property rather than the causal mechanism driving improved robustness
- →Current adversarial training approaches may be targeting superficial correlates instead of fundamental representational structures
- →Future robustness research should focus on geometric properties of learned representations beyond frequency profiles