Coarse-to-fine Hierarchical Architecture with Sequential Mamba for Brain Reconstruction
Researchers introduce CHASMBrain, a hierarchical neural architecture using Mamba models to predict brain activity from images by mimicking the visual cortex's functional organization. The model achieves state-of-the-art performance on brain imaging datasets and reveals that different neural pathways specialize in processing semantic versus spatial information, advancing understanding of how artificial and biological vision systems align.
CHASMBrain represents a significant advancement in computational neuroscience by bridging the gap between artificial vision models and human brain function. The research addresses a fundamental question: how do deep learning architectures map to the hierarchical organization of the visual cortex? By employing a dual-stream Mamba design that separates global semantic processing from local spatial feature extraction, the model explicitly mirrors known organizational principles of biological vision systems.
The two-stage coarse-to-fine approach demonstrates practical sophistication. Stage 1 produces region-of-interest predictions that capture overall activation patterns, while Stage 2 refines these into precise voxel-level reconstructions using a Mamba-VAE. This hierarchical strategy achieves notable performance metrics—0.429 Pearson correlation and 0.261 MSE on the Natural Scenes Dataset—surpassing traditional methods like ridge regression and contemporary vision models.
The causal ablation studies provide crucial insights beyond mere prediction accuracy. Rather than assuming correlation between model components and brain regions, the researchers demonstrate that the patch stream causally drives early retinotopic regions while the semantic (CLS) stream contributes to higher-order visual areas. This asymmetric specialization mirrors known neuroscience, validating that the model's internal structure genuinely reflects cortical organization.
Cross-subject generalization tests reveal the model learns subject-agnostic visual representations, suggesting universality in how biological vision systems process visual information. This finding has implications for understanding human perception and could inform the design of more biologically-plausible artificial neural networks. The research exemplifies how neuroscience-inspired architecture can improve both model performance and interpretability.
- →CHASMBrain achieves state-of-the-art fMRI prediction by explicitly modeling visual cortex functional organization through dual-stream Mamba architecture
- →Causal ablation experiments demonstrate asymmetric specialization where patch streams drive early visual areas while semantic streams support higher-order processing
- →The model generalizes across subjects with minimal adaptation, indicating universal principles in biological vision representation
- →Hierarchical coarse-to-fine prediction strategy outperforms single-stage baselines and contemporary vision models on brain imaging benchmarks
- →Architecture design principles derived from neuroscience improve both predictive accuracy and biological interpretability of neural networks