AIBullisharXiv – CS AI · 3d ago7/10
🧠Researchers introduce Tensor Memory, a fixed-size recurrent module that augments Transformers with persistent 3D spatial state for improved long-sequence processing. The approach enables better video understanding and occlusion reasoning by decoupling memory capacity from input length while maintaining computational efficiency.
AIBullisharXiv – CS AI · 4d ago7/10
🧠Researchers introduce MP-SSM, a novel framework that integrates State-Space Model principles into message-passing neural networks for improved graph learning. The approach achieves permutation equivariance, computational efficiency, and long-range information propagation while enabling theoretical analysis of gradient flow and information dynamics across deep networks.
AIBullisharXiv – CS AI · May 127/10
🧠Researchers propose Kaczmarz Linear Attention (KLA), an improved algorithm for long-context language modeling that replaces empirically-learned coefficients with mathematically-derived key-norm-normalized step sizes. KLA outperforms existing linear attention baselines like Gated DeltaNet while maintaining computational efficiency and enabling stable processing of up to 65K token contexts.
🏢 Perplexity
AIBullisharXiv – CS AI · Apr 207/10
🧠Researchers introduce CoMeT (Collaborative Memory Transformer), a novel architecture that enables large language models to process arbitrarily long sequences with constant memory usage and linear time complexity. The system uses a dual-memory approach with FIFO queues and gated updates, demonstrating remarkable performance on long-context tasks including 1M token sequences and real-world applications.
AIBullisharXiv – CS AI · Mar 56/10
🧠Researchers introduce STAR, a new autoregressive pretraining method for Vision Mamba that uses separators to quadruple input sequence length while maintaining image dimensions. The STAR-B model achieved 83.5% accuracy on ImageNet-1k, demonstrating improved performance through better utilization of long-range dependencies in computer vision tasks.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers introduce FRACTAL, a novel state space model architecture that integrates fractional measure theory to improve long-sequence modeling by balancing short-term sensitivity with long-term memory retention. The approach achieves 87.11% on the Long Range Arena benchmark, outperforming existing SSM models like S5, addressing a fundamental trade-off in temporal sequence analysis.
AINeutralarXiv – CS AI · May 126/10
🧠UxSID is a new machine learning framework that models long user behavior sequences using semantic grouping and dual-level attention, achieving state-of-the-art performance with a 0.337% revenue lift in large-scale advertising tests. The approach balances computational efficiency with semantic awareness by using Semantic IDs rather than item-specific search methods.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers introduce TIDES, a new selective state space model architecture that combines the expressivity of input-dependent models like Mamba with the native irregular time-series handling of continuous-time models like S5. By moving input-dependence to the state matrix rather than the discretization step, TIDES maintains the physical meaning of time intervals while preserving per-token expressivity, achieving state-of-the-art results on time-series benchmarks.
AINeutralarXiv – CS AI · May 46/10
🧠Researchers introduce Caracal, a novel architecture that replaces attention mechanisms with a parameter-efficient Multi-Head Fourier module to improve LLM scalability for long sequences. The approach achieves O(L log L) complexity using Fast Fourier Transform, implements frequency-domain causal masking for autoregressive generation, and uses standard library operators for broad deployment compatibility.
AINeutralarXiv – CS AI · Mar 26/1011
🧠Researchers introduce Memory Caching (MC), a technique that enhances recurrent neural networks by allowing their memory capacity to grow with sequence length, bridging the gap between fixed-memory RNNs and growing-memory Transformers. The approach offers four variants and shows competitive performance with Transformers on language modeling and long-context tasks while maintaining better computational efficiency.
AIBullisharXiv – CS AI · Feb 276/108
🧠Researchers introduce a quantum-inspired sequence modeling framework that uses complex-valued wave functions and quantum interference for language processing. The approach shows theoretical advantages over traditional recurrent neural networks by utilizing quantum dynamics and the Born rule for token probability extraction.
AIBullisharXiv – CS AI · Mar 35/105
🧠Researchers developed SMDIM, a new diffusion model for symbolic music generation that efficiently handles long sequences by combining global structure construction with local refinement. The model outperforms existing approaches in both generation quality and computational efficiency across various musical styles including Western classical, popular, and folk music.
$NEAR