#sequence-modeling News & Analysis

24 articles tagged with #sequence-modeling. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

24 articles

AIBullisharXiv – CS AI · Jun 97/10

🧠

Beyond Item IDs: Scaling Short-Form-Video Recommendation via Semantic-Native Long Sequence Modeling

Researchers present a production-deployed recommendation system that scales short-form video suggestions to billion-user scale by replacing traditional Video IDs with semantic-native representations and introducing a compression transformer to reduce computational complexity. The framework achieves order-of-magnitude improvements in memory efficiency and enables longer user behavior sequences, delivering measurable gains in user engagement and content consumption metrics.

AIBullisharXiv – CS AI · Jun 97/10

🧠

Decoupling the "What" and "Where" With Polar Coordinate Positional Embeddings

Researchers present Polar Coordinate Position Embeddings (PoPE), an improvement to RoPE rotary position embeddings that decouples content matching from positional matching in Transformer attention mechanisms. PoPE demonstrates superior performance on language modeling, music, and genomic sequence tasks while achieving strong zero-shot length extrapolation capabilities without additional fine-tuning.

🏢 Perplexity

AIBullisharXiv – CS AI · May 287/10

🧠

Tensor Memory: Fixed-Size Recurrent State for Long-Horizon Transformers

Researchers introduce Tensor Memory, a fixed-size recurrent module that augments Transformers with persistent 3D spatial state for improved long-sequence processing. The approach enables better video understanding and occlusion reasoning by decoupling memory capacity from input length while maintaining computational efficiency.

AIBullisharXiv – CS AI · May 277/10

🧠

Message-Passing State-Space Models: Improving Graph Learning with Modern Sequence Modeling

Researchers introduce MP-SSM, a novel framework that integrates State-Space Model principles into message-passing neural networks for improved graph learning. The approach achieves permutation equivariance, computational efficiency, and long-range information propagation while enabling theoretical analysis of gradient flow and information dynamics across deep networks.

AIBullisharXiv – CS AI · May 127/10

🧠

Kaczmarz Linear Attention

Researchers propose Kaczmarz Linear Attention (KLA), an improved algorithm for long-context language modeling that replaces empirically-learned coefficients with mathematically-derived key-norm-normalized step sizes. KLA outperforms existing linear attention baselines like Gated DeltaNet while maintaining computational efficiency and enabling stable processing of up to 65K token contexts.

🏢 Perplexity

AIBullisharXiv – CS AI · Apr 207/10

🧠

CoMeT: Collaborative Memory Transformer for Efficient Long Context Modeling

Researchers introduce CoMeT (Collaborative Memory Transformer), a novel architecture that enables large language models to process arbitrarily long sequences with constant memory usage and linear time complexity. The system uses a dual-memory approach with FIFO queues and gated updates, demonstrating remarkable performance on long-context tasks including 1M token sequences and real-world applications.

AIBullisharXiv – CS AI · Mar 56/10

🧠

Separators in Enhancing Autoregressive Pretraining for Vision Mamba

Researchers introduce STAR, a new autoregressive pretraining method for Vision Mamba that uses separators to quadruple input sequence length while maintaining image dimensions. The STAR-B model achieved 83.5% accuracy on ImageNet-1k, demonstrating improved performance through better utilization of long-range dependencies in computer vision tasks.

AIBullisharXiv – CS AI · Jun 236/10

🧠

Topological Neural Dynamics: A Neuron-wise Framework for Sequence Modeling

Researchers introduce Topological Neural Dynamics (TND), a novel sequence modeling framework that replaces traditional layer-wise neural computation with neuron-wise dynamics where individual neurons evolve independently through explicit graph topology. In a Pong behavior cloning benchmark, TND outperforms RNNs, LSTMs, continuous-time networks, and Transformers with a catch rate more than three times higher than the strongest baseline, suggesting this architectural approach offers a more effective inductive bias for sequence modeling.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Chain-of-Goals Hierarchical Policy for Long-Horizon Offline Goal-Conditioned RL

Researchers introduce Chain-of-Goals Hierarchical Policy (CoGHP), a novel framework that applies chain-of-thought reasoning to offline reinforcement learning by autoregressively generating sequences of intermediate subgoals to solve long-horizon tasks. The unified architecture demonstrates consistent performance improvements over existing hierarchical baselines on navigation and manipulation benchmarks.

AINeutralarXiv – CS AI · Jun 116/10

🧠

Multi-Rate Mixture of Experts for Accelerating Liquid Neural Network Training

Researchers propose Multi-Rate Mixture-of-Experts (MR-MoE), a framework that enhances Liquid Neural Networks for time-series modeling by deploying multiple experts operating at different time scales with adaptive gating. The approach combines continuous-time dynamics, multi-scale decomposition, and attention mechanisms to outperform traditional RNNs and monolithic LNNs on complex multivariate time-series tasks.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Q-Delta: Beyond Key-Value Associative State Evolution

Q-Delta presents a novel approach to linear attention mechanisms in sequence modeling by integrating query-conditioned state evolution, moving beyond traditional key-value associative paradigms. The method combines efficient linear-time inference with improved performance on language modeling and long-context retrieval tasks through a hardware-optimized implementation.

AINeutralarXiv – CS AI · Jun 56/10

🧠

Pretraining Recurrent Networks without Recurrence

Researchers propose Supervised Memory Training (SMT), a novel method for training recurrent neural networks that replaces sequential backpropagation through time with parallel, supervised learning on memory state transitions. By leveraging a Transformer encoder to generate training labels, SMT achieves stable gradient propagation and improved performance on language and sequence modeling tasks without the parallelism constraints of traditional RNN training.

AIBullisharXiv – CS AI · Jun 46/10

🧠

Scaling Novel Graph Generation via Lightweight Structure-Guided Autoregressive Models

Researchers propose a lightweight autoregressive framework for graph generation that achieves near log-linear complexity by using structure-guided topological ordering, addressing scalability limitations in current diffusion and autoregressive models. The two-phase training strategy reduces overfitting and promotes novel graph generation while maintaining validity, with applications spanning molecular discovery, circuit design, and cybersecurity.

AINeutralarXiv – CS AI · Jun 46/10

🧠

MesaNet: Sequence Modeling by Locally Optimal Test-Time Training

Researchers introduce MesaNet, an improved recurrent neural network architecture that optimizes sequence modeling through test-time training, achieving better language modeling performance than previous RNNs while requiring additional inference-time compute. The work advances the trend toward linearized transformers that maintain constant memory costs during inference, positioning computational efficiency against performance gains.

🏢 Perplexity

AINeutralarXiv – CS AI · Jun 26/10

🧠

SHARP: Sleep-based Hierarchical Accelerated Replay for Long Range Non-Stationary Temporal Pattern Recognition

Researchers introduce SHARP, a neural network framework designed to recognize long-range temporal patterns in streaming data by combining a memory module with a pattern-recognition module, inspired by sleep-based memory consolidation in mammals. The approach achieves better performance than recurrent neural networks and transformers on benchmark datasets while maintaining computational efficiency through hierarchical processing.

AINeutralarXiv – CS AI · Jun 25/10

🧠

Enhancing BiGRU with a KAN Block for Legal Document Classification and Summarization

Researchers have developed a novel neural architecture combining Kolmogorov-Arnold Networks (KAN) with BiGRU models for classifying and summarizing legal documents in multilingual, low-resource settings. Tested on Bengali, English, and transliterated Bengali legal documents from Bangladesh, the hybrid model achieved 67.96% classification accuracy while demonstrating that KAN integration improved performance by over 10 percentage points.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Learning to Remember, Learn, and Forget in Attention-Based Models

Researchers propose Palimpsa, a self-attention model that frames in-context learning as a continual learning problem using Bayesian metaplasticity to overcome memory interference in long sequences. The framework unifies existing gated linear attention models as special cases and demonstrates improved performance on associative recall and reasoning tasks, offering a theoretical foundation for enhancing memory capacity in transformer-based architectures.

AINeutralarXiv – CS AI · May 126/10

🧠

FRACTAL: SSM with Fractional Recurrent Architecture for Computational Temporal Analysis of Long Sequences

Researchers introduce FRACTAL, a novel state space model architecture that integrates fractional measure theory to improve long-sequence modeling by balancing short-term sensitivity with long-term memory retention. The approach achieves 87.11% on the Long Range Arena benchmark, outperforming existing SSM models like S5, addressing a fundamental trade-off in temporal sequence analysis.

AINeutralarXiv – CS AI · May 126/10

🧠

UxSID: Semantic-Aware User Interests Modeling for Ultra-Long Sequence

UxSID is a new machine learning framework that models long user behavior sequences using semantic grouping and dual-level attention, achieving state-of-the-art performance with a 0.337% revenue lift in large-scale advertising tests. The approach balances computational efficiency with semantic awareness by using Semantic IDs rather than item-specific search methods.

AINeutralarXiv – CS AI · May 126/10

🧠

TIDES: Implicit Time-Awareness in Selective State Space Models

Researchers introduce TIDES, a new selective state space model architecture that combines the expressivity of input-dependent models like Mamba with the native irregular time-series handling of continuous-time models like S5. By moving input-dependence to the state matrix rather than the discretization step, TIDES maintains the physical meaning of time intervals while preserving per-token expressivity, achieving state-of-the-art results on time-series benchmarks.

AINeutralarXiv – CS AI · May 46/10

🧠

Caracal: Causal Architecture via Spectral Mixing

Researchers introduce Caracal, a novel architecture that replaces attention mechanisms with a parameter-efficient Multi-Head Fourier module to improve LLM scalability for long sequences. The approach achieves O(L log L) complexity using Fast Fourier Transform, implements frequency-domain causal masking for autoregressive generation, and uses standard library operators for broad deployment compatibility.

AINeutralarXiv – CS AI · Mar 26/1011

🧠

Memory Caching: RNNs with Growing Memory

Researchers introduce Memory Caching (MC), a technique that enhances recurrent neural networks by allowing their memory capacity to grow with sequence length, bridging the gap between fixed-memory RNNs and growing-memory Transformers. The approach offers four variants and shows competitive performance with Transformers on language modeling and long-context tasks while maintaining better computational efficiency.

AIBullisharXiv – CS AI · Feb 276/108

🧠

Deep Sequence Modeling with Quantum Dynamics: Language as a Wave Function

Researchers introduce a quantum-inspired sequence modeling framework that uses complex-valued wave functions and quantum interference for language processing. The approach shows theoretical advantages over traditional recurrent neural networks by utilizing quantum dynamics and the Born rule for token probability extraction.

AIBullisharXiv – CS AI · Mar 35/105

🧠

Efficient Long-Sequence Diffusion Modeling for Symbolic Music Generation

Researchers developed SMDIM, a new diffusion model for symbolic music generation that efficiently handles long sequences by combining global structure construction with local refinement. The model outperforms existing approaches in both generation quality and computational efficiency across various musical styles including Western classical, popular, and folk music.

$NEAR