AINeutralarXiv – CS AI · 5d ago6/10
🧠Researchers propose a modified Transformer encoder that explicitly separates positional and semantic information into three independent streams, revealing that positional data naturally collapses into a low-frequency 2D structure and that standard encoding methods fail to preserve macroscopic positional information under language modeling pressure.
AINeutralarXiv – CS AI · 6d ago6/10
🧠A comprehensive survey examines how Mixture-of-Experts (MoE) architectures address multimodal learning challenges by enabling scalable modeling, enriching representation learning across modalities, and adapting to imperfect data scenarios. The research identifies critical gaps in interpretable routing, expert communication, and lifelong multimodal learning, positioning MoE as a foundational framework for building more efficient and flexible AI systems.
AINeutralarXiv – CS AI · 6d ago6/10
🧠Researchers at arXiv demonstrate that model architecture significantly impacts how well neural networks handle FP4 quantization for medical image analysis. Swin Transformers maintain quality across different quantization recipes and scales, while CNNs degrade under certain conditions, establishing practical guidelines for deploying efficient anomaly segmentation models.
AINeutralarXiv – CS AI · 6d ago6/10
🧠Researchers demonstrate that the Muon optimizer significantly outperforms Adam when training equivariant neural networks, which encode geometric symmetries by design. Analysis of trained models reveals Muon produces solutions with more regular loss surfaces, higher weight ranks, and better-conditioned representations, suggesting optimizer choice substantially influences how neural networks learn geometric constraints.
AINeutralarXiv – CS AI · 6d ago6/10
🧠Researchers introduce Vector Networks (VN), a neural architecture that replaces dense weight matrices with libraries of reusable rank-1 weight atoms, enabling selective composition of network components for novel tasks. The approach demonstrates significant out-of-distribution generalization improvements—up to an order of magnitude better than baselines—when familiar elements must be recombined in new ways, addressing a fundamental limitation in deep learning's ability to handle compositional reasoning.
AINeutralarXiv – CS AI · 6d ago6/10
🧠Researchers demonstrate that Vision Transformers face fundamental architectural limitations in spatial reasoning tasks due to computational complexity constraints. By framing spatial understanding as a group homomorphism problem, they prove that constant-depth ViTs cannot capture non-solvable spatial structures like 3D rotations, revealing a theoretical gap between required complexity classes.
AINeutralarXiv – CS AI · May 276/10
🧠Researchers demonstrate that scale vectors in large language models, despite comprising negligible model parameters, significantly impact training performance and optimization. Through theoretical analysis and empirical validation across models from 0.12B to 2B parameters, the study proposes three complementary improvements to scale vector design that enhance training efficiency without adding computational overhead.
AIBullisharXiv – CS AI · May 276/10
🧠Researchers propose PIPO (Pair-In, Pair-Out), a novel technique that combines input compression and multi-token prediction to accelerate large language model inference. The method eliminates expensive verification steps while achieving up to 2.64x speedups in first-token latency and demonstrating significant improvements on reasoning benchmarks.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers introduce MS-FLOW, a machine learning framework that improves multivariate time series forecasting by using sparse, selective connections between variables rather than dense interactions. The approach addresses the problem of spurious correlations that plague existing methods, achieving state-of-the-art accuracy on 12 benchmarks while identifying fewer but more reliable dependencies.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers demonstrate that early layers of cohort-trained Implicit Neural Representations (INRs) encode transferable features for signal fitting, identifying optimal freezing points through weight stable rank analysis. Using sparse autoencoders for mechanistic interpretability, they reveal that SIREN and Fourier-feature MLPs learn fundamentally different dictionary representations despite comparable performance, with implications for designing more generalizable neural architectures.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers introduce mHC-SSM, a novel architecture combining Manifold-Constrained Hyper-Connections with state space language models using stream-specialized adapters. The approach achieves significant perplexity improvements (572.91 to 461.88) on WikiText-2 benchmarks with predictable efficiency tradeoffs in throughput and memory usage.
🏢 Meta🏢 Perplexity
AINeutralarXiv – CS AI · May 126/10
🧠Researchers introduce CDS4RAG, a novel optimization framework that improves Retrieval-Augmented Generation systems by cyclically optimizing retriever and generator hyperparameters separately rather than treating them as a monolithic unit. The method achieves up to 1.54x improvements in generation quality while demonstrating faster convergence across multiple benchmarks and language models.
AIBullisharXiv – CS AI · May 126/10
🧠Researchers introduce Lattice Deduction Transformers (LDT), a specialized neural architecture that achieves near-perfect accuracy on constraint-solving puzzles like Sudoku and Mazes while remaining logically sound. The approach demonstrates that smaller models with domain-specific architectures can outperform large language models on reasoning tasks.
AINeutralarXiv – CS AI · May 125/10
🧠Researchers introduce KANMultiSign, a neural network framework that converts sign language notation into pose animations using Kolmogorov-Arnold Networks integrated with Transformers. The system achieves improved accuracy with fewer parameters across multiple sign languages, demonstrating that multi-scale supervision is the key driver of performance gains.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers introduce TIDES, a new selective state space model architecture that combines the expressivity of input-dependent models like Mamba with the native irregular time-series handling of continuous-time models like S5. By moving input-dependence to the state matrix rather than the discretization step, TIDES maintains the physical meaning of time intervals while preserving per-token expressivity, achieving state-of-the-art results on time-series benchmarks.
AIBullisharXiv – CS AI · May 116/10
🧠Researchers introduce HyperEyes, a parallel multimodal search agent that processes multiple entities concurrently rather than sequentially, achieving 9.9% higher accuracy with 5.3x fewer tool calls than comparable systems. The system combines visual grounding and retrieval into atomic actions and uses dual-level reinforcement learning to optimize both accuracy and inference efficiency, addressing a gap in existing multimodal AI benchmarks that ignore computational cost.
AIBullisharXiv – CS AI · May 46/10
🧠Researchers propose Persistent Visual Memory (PVM), a lightweight module that addresses visual signal degradation in Large Vision-Language Models by maintaining consistent visual perception during long text generation. Integrated into Qwen3-VL models, PVM demonstrates measurable accuracy improvements with minimal computational overhead, particularly benefiting complex reasoning tasks.
AIBullisharXiv – CS AI · Apr 156/10
🧠TimeSAF introduces a hierarchical asynchronous fusion framework that improves how large language models guide time series forecasting by decoupling semantic understanding from numerical dynamics. This addresses a fundamental architectural limitation in existing methods and demonstrates superior performance on standard benchmarks with strong generalization capabilities.
AIBullisharXiv – CS AI · Apr 136/10
🧠Researchers propose AR-KAN, a neural network combining autoregressive models with Kolmogorov-Arnold Networks for improved time series forecasting. The model addresses limitations of traditional deep learning approaches by integrating temporal memory preservation with nonlinear function approximation, demonstrating superior performance on both synthetic and real-world datasets.
AIBullisharXiv – CS AI · Apr 106/10
🧠Researchers introduce S³ (Stratified Scaling Search), a test-time scaling method for diffusion language models that improves output quality by reallocating compute during the denoising process rather than simple best-of-K sampling. The technique uses a lightweight verifier to evaluate and selectively resample candidate trajectories at each step, demonstrating consistent performance gains across mathematical reasoning and knowledge tasks without requiring model retraining.
AINeutralarXiv – CS AI · Mar 54/10
🧠Researchers developed a memory-augmented transformer that uses attention for retrieval, consolidation, and write-back operations, with lateralized memory banks connected through inhibitory cross-talk. The inhibitory coupling mechanism enables functional specialization between memory banks, achieving superior performance on episodic recall tasks while maintaining rule-based prediction capabilities.