#state-space-models News & Analysis

36 articles tagged with #state-space-models. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

36 articles

AIBullisharXiv – CS AI · Jun 237/10

🧠

From Markov to Laplace: How Mamba In-Context Learns Markov Chains

Researchers demonstrate that Mamba, a state space model alternative to transformers, efficiently learns optimal statistical estimators for Markov chains through in-context learning. The study reveals that single-layer Mamba discovers the Laplacian smoothing estimator—which is both Bayes and minimax optimal—and theoretically explains this capability through convolution-based representation.

AIBullisharXiv – CS AI · Jun 27/10

🧠

Zamba2-VL Technical Report

Zyphra released Zamba2-VL, a suite of vision-language models combining Mamba2 state-space layers with transformer blocks, achieving competitive performance with leading VLMs while delivering 10x faster time-to-first-token speeds. The three released models (1.2B, 2.7B, 7B parameters) represent a significant efficiency breakthrough for edge and on-device deployment.

🏢 Hugging Face

AIBullisharXiv – CS AI · Jun 17/10

🧠

Efficient Learning of Deep State Space Models via Importance Smoothing

Researchers introduce Parallel Variational Monte Carlo (PVMC), a novel training method for deep state space models that combines strengths of variational and sequential Monte Carlo approaches. The technique achieves comparable or superior performance to existing methods while running 10x faster, addressing a critical scalability bottleneck in training complex temporal models.

AIBullisharXiv – CS AI · May 287/10

🧠

CaMBRAIN: Real-time, Continuous EEG Inference with Causal State Space Models

Researchers introduce CaMBRAIN, a causal state space model based on Mamba architecture that enables real-time, continuous EEG signal processing with linear-time complexity. The model achieves state-of-the-art results across multiple datasets while processing signals >10x faster than existing attention-based methods, overcoming critical limitations in handling variable-length brain activity recordings.

AIBullisharXiv – CS AI · May 287/10

🧠

Do Language Models Need Sleep? Offline Recurrence for Improved Online Inference

Researchers propose a sleep-like mechanism for transformer language models that periodically consolidates context into persistent fast weights, reducing the computational burden of long sequences. The method shifts heavy computation offline while maintaining fast inference speeds, showing significant improvements on reasoning tasks that standard transformers struggle with.

AIBullisharXiv – CS AI · May 277/10

🧠

Message-Passing State-Space Models: Improving Graph Learning with Modern Sequence Modeling

Researchers introduce MP-SSM, a novel framework that integrates State-Space Model principles into message-passing neural networks for improved graph learning. The approach achieves permutation equivariance, computational efficiency, and long-range information propagation while enabling theoretical analysis of gradient flow and information dynamics across deep networks.

AIBullisharXiv – CS AI · May 97/10

🧠

Sparse Prefix Caching for Hybrid and Recurrent LLM Serving

Researchers propose sparse prefix caching, a novel optimization technique for hybrid and recurrent LLM serving that stores exact states at checkpoint positions rather than caching entire token histories. The method uses dynamic programming to determine optimal checkpoint placement and demonstrates superior performance on real-world datasets while using fewer checkpoints than existing dense caching approaches.

AIBullisharXiv – CS AI · May 77/10

🧠

RetentiveKV: State-Space Memory for Uncertainty-Aware Multimodal KV Cache Eviction

RetentiveKV introduces an entropy-driven optimization method for multimodal large language models that achieves 5x KV cache compression and 1.5x decoding acceleration by reformulating token eviction as continuous memory evolution rather than discrete pruning. The approach addresses limitations of existing compression methods by accounting for visual tokens that gain importance later in decoding and preserving spatial continuity of visual information.

AIBullisharXiv – CS AI · Mar 56/10

🧠

Separators in Enhancing Autoregressive Pretraining for Vision Mamba

Researchers introduce STAR, a new autoregressive pretraining method for Vision Mamba that uses separators to quadruple input sequence length while maintaining image dimensions. The STAR-B model achieved 83.5% accuracy on ImageNet-1k, demonstrating improved performance through better utilization of long-range dependencies in computer vision tasks.

AIBullisharXiv – CS AI · Mar 57/10

🧠

Architectural Proprioception in State Space Models: Thermodynamic Training Induces Anticipatory Halt Detection

Researchers introduce the Probability Navigation Architecture (PNA) framework that trains State Space Models with thermodynamic principles, discovering that SSMs develop 'architectural proprioception' - the ability to predict when to stop computation based on internal state entropy. This breakthrough shows SSMs can achieve computational self-awareness while Transformers cannot, with significant implications for efficient AI inference systems.

AINeutralarXiv – CS AI · Mar 47/103

🧠

Retrievit: In-context Retrieval Capabilities of Transformers, State Space Models, and Hybrid Architectures

Research compares Transformers, State Space Models (SSMs), and hybrid architectures for in-context retrieval tasks, finding hybrid models excel at information-dense retrieval while Transformers remain superior for position-based tasks. SSM-based models develop unique locality-aware embeddings that create interpretable positional structures, explaining their specific strengths and limitations.

AIBullisharXiv – CS AI · Feb 277/106

🧠

Decision MetaMamba: Enhancing Selective SSM in Offline RL with Heterogeneous Sequence Mixing

Researchers propose Decision MetaMamba (DMM), a new AI model architecture that improves offline reinforcement learning by addressing information loss issues in Mamba-based models. The solution uses a dense layer-based sequence mixer and modified positional structure to achieve state-of-the-art performance with fewer parameters.

AIBullishSynced Review · May 287/104

🧠

Adobe Research Unlocking Long-Term Memory in Video World Models with State-Space Models

Adobe Research has developed a breakthrough approach to video generation that solves long-term memory challenges by combining State-Space Models (SSMs) with dense local attention mechanisms. The researchers used advanced training strategies including diffusion forcing and frame local attention to achieve coherent long-range video generation.

AINeutralarXiv – CS AI · Jun 236/10

🧠

MS-rPPG: Multi-spectral State Space Model for Remote Photoplethysmography in Driver Monitoring Systems

Researchers introduce MS-rPPG, a multi-spectral framework combining RGB and near-infrared video for remote heart rate estimation in driver monitoring systems. The method uses a novel state space model (MS-Mamba) to improve accuracy under challenging driving conditions with varying lighting and head movements, validated on real-world datasets.

AINeutralarXiv – CS AI · Jun 236/10

🧠

An approach with Visual and Tabular Mamba to multimodal medical data using Mixed Fusion

Researchers propose a Mamba-based architecture for multimodal medical data fusion that combines visual and tabular processing to improve cancer classification interpretability. Testing on skin and oral cancer datasets shows competitive performance with enhanced explainability through SHAP analysis, positioning state space models as viable alternatives to Transformers in medical AI applications.

AINeutralarXiv – CS AI · Jun 196/10

🧠

SL-S4Wave: Self-Supervised Learning of Physiological Waveforms with Structured State Space Models

Researchers introduce SL-S4Wave, a self-supervised learning framework combining contrastive learning with structured state space models to analyze physiological waveforms like ECGs and EEGs. The approach outperforms existing methods in detecting arrhythmias, requires fewer labeled examples, and generalizes effectively across different cardiac conditions and brain signals.

AIBullisharXiv – CS AI · Jun 196/10

🧠

RoboSSM: Scalable In-context Imitation Learning via State-Space Models

Researchers introduce RoboSSM, a new in-context imitation learning framework that replaces Transformers with state-space models (SSMs) for robotic task learning. The approach demonstrates superior performance on long-context prompts and achieves better generalization to unseen tasks compared to Transformer-based methods, establishing SSMs as a viable alternative backbone for robot learning systems.

AIBullisharXiv – CS AI · Jun 116/10

🧠

Lung-SRAD: Spectral-Aware Regularized Audio DASS with Dual-Axis Patch-Mix Contrastive Learning for Respiratory Sound Classification

Researchers introduce Lung-SRAD, a novel respiratory sound classification system using State Space Models instead of traditional transformer architectures, achieving 64.48% accuracy on the ICBHI benchmark—a 5% improvement over the Audio Spectrogram Transformer baseline. The approach combines spectral-aware regularization with dual-axis patch-mix contrastive learning to better detect localized abnormal respiratory patterns.

AINeutralarXiv – CS AI · Jun 106/10

🧠

Blurry Window Attention

Researchers introduce Blurry Window Attention (BLA), a novel attention mechanism that addresses the quadratic complexity and memory limitations of traditional Transformer models by reconstructing sparse key-value history through Dirichlet kernel interpolation. BLA demonstrates 8x state efficiency improvements over sliding window attention while maintaining competitive performance on information retrieval tasks, positioning it as a viable alternative for long-context language modeling.

🏢 Perplexity

AINeutralarXiv – CS AI · Jun 45/10

🧠

SFMambaNet: Spectral-Frequency Enhanced Selective State Space Model for Correspondence Pruning

Researchers introduce SFMambaNet, a novel deep learning architecture that combines spectral-frequency analysis with Mamba-based state space models to improve correspondence pruning—the task of filtering accurate feature matches from noisy initial sets. The method outperforms existing Graph Neural Network approaches by integrating frequency domain perception to better distinguish valid correspondences from outliers.

AINeutralarXiv – CS AI · Jun 46/10

🧠

Learning Long Range Spatio-Temporal Representations over Continuous Time Dynamic Graphs with State Space Models

Researchers introduce CTDG-SSM, a novel state-space modeling framework for continuous-time dynamic graphs that captures long-range temporal and spatial patterns through a topology-aware memory mechanism. The approach achieves state-of-the-art results on dynamic link prediction, node classification, and sequence classification benchmarks, particularly excelling on datasets requiring long-range reasoning.

AINeutralarXiv – CS AI · Jun 26/10

🧠

EnergyMamba: An Uncertainty-Aware Graph-Enhanced Selective State Space Model for Energy Consumption Prediction

Researchers introduce EnergyMamba, a machine learning framework that combines graph neural networks with state-space models to predict energy consumption while quantifying prediction uncertainty. The system achieves 5% accuracy improvement over existing methods by simultaneously modeling spatial grid relationships and temporal patterns, with enhanced reliability during abnormal conditions like extreme weather.

AIBullisharXiv – CS AI · Jun 26/10

🧠

Forget Attention: Importance-Aware Attention Is All You Need

Researchers propose SISA (SSM-Informed Softmax Attention), a hybrid architecture that integrates state space model importance signals directly into transformer attention mechanisms at the score level. The approach achieves superior performance on language modeling benchmarks, particularly excelling at long-context retrieval tasks while maintaining computational efficiency through standard operations.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Computation-Aware Kalman Filtering with Model Selection for Neural Dynamics

Researchers introduce CASSM, a Bayesian framework that combines Kalman filtering with model selection to improve neural dynamics modeling on modern datasets. The method addresses computational complexity and uncertainty calibration challenges, offering competitive performance with deep networks while maintaining better uncertainty quantification, particularly for datasets with fewer trials than recorded neurons.

AINeutralarXiv – CS AI · Jun 26/10

🧠

RPCASSM: Robust PCA State Space Model For Infrared Small Target Detection

Researchers introduce RPCASSM, a novel deep learning architecture for detecting small infrared targets by combining robust principal component analysis with state space models. The approach addresses limitations of existing vision models by designing specialized modules to separately process background and target information, improving edge detection accuracy for surveillance and maritime applications.

Page 1 of 2Next →