y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#neural-architecture News & Analysis

36 articles tagged with #neural-architecture. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

36 articles
AIBullisharXiv – CS AI · 6d ago7/10
🧠

Tensor Memory: Fixed-Size Recurrent State for Long-Horizon Transformers

Researchers introduce Tensor Memory, a fixed-size recurrent module that augments Transformers with persistent 3D spatial state for improved long-sequence processing. The approach enables better video understanding and occlusion reasoning by decoupling memory capacity from input length while maintaining computational efficiency.

AIBullisharXiv – CS AI · May 277/10
🧠

Message-Passing State-Space Models: Improving Graph Learning with Modern Sequence Modeling

Researchers introduce MP-SSM, a novel framework that integrates State-Space Model principles into message-passing neural networks for improved graph learning. The approach achieves permutation equivariance, computational efficiency, and long-range information propagation while enabling theoretical analysis of gradient flow and information dynamics across deep networks.

AIBullisharXiv – CS AI · May 277/10
🧠

Stabilizing Recurrent Dynamics for Test-Time Scalable Latent Reasoning in Looped Language Models

Researchers propose STARS, a training framework that stabilizes Looped Language Models (LoopLMs) to enable reliable test-time scaling through latent reasoning. The method uses Jacobian Spectral Radius Regularization to constrain neural states toward stable fixed points, addressing a critical problem where model performance peaks then collapses with increased recurrence depth.

AIBullisharXiv – CS AI · May 127/10
🧠

Echo-LoRA: Parameter-Efficient Fine-Tuning via Cross-Layer Representation Injection

Echo-LoRA introduces a parameter-efficient fine-tuning method that injects cross-layer representations from deeper neural network layers into shallow LoRA modules during training, achieving 3-5.7% performance improvements on reasoning tasks without adding inference costs. The technique discards its auxiliary training path post-deployment, maintaining the efficiency benefits of standard LoRA while delivering measurable capability gains.

AIBullisharXiv – CS AI · May 117/10
🧠

Toeplitz MLP Mixers are Low Complexity, Information-Rich Sequence Models

Researchers introduce Toeplitz MLP Mixer (TMM), a transformer alternative that replaces attention mechanisms with triangular-masked Toeplitz matrix multiplication, achieving O(dn log n) training complexity and O(dn) inference complexity. TMMs demonstrate superior training efficiency, information retention, and in-context learning performance compared to existing sub-quadratic architectures.

AIBullisharXiv – CS AI · May 97/10
🧠

Normalized Architectures are Natively 4-Bit

Researchers demonstrate that nGPT, a neural architecture that normalizes weights and hidden representations to a unit hypersphere, achieves stable 4-bit precision training without requiring additional quantization interventions. The approach leverages mathematical properties of dot products to maintain stronger signal-to-noise ratios, enabling efficient training of models up to 30B parameters.

AINeutralarXiv – CS AI · Apr 147/10
🧠

Universal statistical signatures of evolution in artificial intelligence architectures

A comprehensive study analyzing 935 ablation experiments from 161 publications reveals that artificial intelligence architectural evolution follows the same statistical laws as biological evolution, with a heavy-tailed distribution of fitness effects placing AI between viral genomes and simple organisms. The findings suggest that evolutionary statistical structure is substrate-independent and determined by fitness landscape topology rather than the underlying selection mechanism.

AIBullisharXiv – CS AI · Apr 147/10
🧠

Retrieval as Generation: A Unified Framework with Self-Triggered Information Planning

Researchers introduce GRIP, a unified framework that integrates retrieval decisions directly into language model generation through control tokens, eliminating the need for external retrieval controllers. The system enables models to autonomously decide when to retrieve information, reformulate queries, and terminate retrieval within a single autoregressive process, achieving competitive performance with GPT-4o while using substantially fewer parameters.

🧠 GPT-4
AINeutralarXiv – CS AI · Apr 147/10
🧠

The Myth of Expert Specialization in MoEs: Why Routing Reflects Geometry, Not Necessarily Domain Expertise

Researchers demonstrate that Mixture of Experts (MoEs) specialization in large language models emerges from hidden state geometry rather than specialized routing architecture, challenging assumptions about how these systems work. Expert routing patterns resist human interpretation across models and tasks, suggesting that understanding MoE specialization remains as difficult as the broader unsolved problem of interpreting LLM internal representations.

AIBullisharXiv – CS AI · Apr 147/10
🧠

Zero-shot World Models Are Developmentally Efficient Learners

Researchers introduce Zero-shot Visual World Models (ZWM), a computational framework inspired by how young children learn physical understanding from minimal data. The approach combines sparse prediction, causal inference, and compositional reasoning to achieve data-efficient learning, demonstrating that AI systems can match child development patterns while learning from single-child observational data.

AIBullisharXiv – CS AI · Apr 107/10
🧠

Do We Need Distinct Representations for Every Speech Token? Unveiling and Exploiting Redundancy in Large Speech Language Models

Researchers demonstrate that large speech language models contain significant redundancy in their token representations, particularly in deeper layers. By introducing Affinity Pooling, a training-free token merging technique, they achieve 27.48% reduction in prefilling FLOPs and up to 1.7× memory savings while maintaining semantic accuracy, challenging the necessity of fully distinct tokens for acoustic processing.

AIBullisharXiv – CS AI · Mar 167/10
🧠

SRAM-Based Compute-in-Memory Accelerator for Linear-decay Spiking Neural Networks

Researchers developed an SRAM-based compute-in-memory accelerator for spiking neural networks that uses linear decay approximation instead of exponential decay, achieving 1.1x to 16.7x reduction in energy consumption. The innovation addresses the bottleneck of neuron state updates in neuromorphic computing by performing in-place decay directly within memory arrays.

AIBullisharXiv – CS AI · Mar 57/10
🧠

Architectural Proprioception in State Space Models: Thermodynamic Training Induces Anticipatory Halt Detection

Researchers introduce the Probability Navigation Architecture (PNA) framework that trains State Space Models with thermodynamic principles, discovering that SSMs develop 'architectural proprioception' - the ability to predict when to stop computation based on internal state entropy. This breakthrough shows SSMs can achieve computational self-awareness while Transformers cannot, with significant implications for efficient AI inference systems.

AIBullisharXiv – CS AI · Mar 37/104
🧠

Learning Internal Biological Neuron Parameters and Complexity-Based Encoding for Improved Spiking Neural Networks Performance

Researchers developed a novel learning approach for spiking neural networks that optimizes both synaptic weights and intrinsic neuronal parameters, achieving up to 13.50 percentage point improvements in classification accuracy. The study introduces a biologically-inspired SNN-LZC classifier that achieves 99.50% accuracy with sub-millisecond inference latency.

AINeutralarXiv – CS AI · 5d ago6/10
🧠

Give it Space! Explicit Disentangling of Positional and Semantic Representations in Encoders

Researchers propose a modified Transformer encoder that explicitly separates positional and semantic information into three independent streams, revealing that positional data naturally collapses into a low-frequency 2D structure and that standard encoding methods fail to preserve macroscopic positional information under language modeling pressure.

AINeutralarXiv – CS AI · 5d ago6/10
🧠

The Cognitive Categorical Transformer: Category-Theoretic Inductive Biases for Language Modeling

Researchers introduce the Cognitive Categorical Transformer (CCT), a 306M-parameter language model that applies category-theoretic principles to improve upon GPT-2 Small, achieving 12% relative perplexity reduction on WikiText-103. The work provides empirical validation that simplicial message passing enhances language modeling performance and identifies a distinction between topology-adding versus consistency-enforcing categorical priors.

🏢 Perplexity
AINeutralarXiv – CS AI · 6d ago6/10
🧠

On the Intrinsic Limits of Transformer Image Embeddings in Non-Solvable Spatial Reasoning

Researchers demonstrate that Vision Transformers face fundamental architectural limitations in spatial reasoning tasks due to computational complexity constraints. By framing spatial understanding as a group homomorphism problem, they prove that constant-depth ViTs cannot capture non-solvable spatial structures like 3D rotations, revealing a theoretical gap between required complexity classes.

AINeutralarXiv – CS AI · 6d ago6/10
🧠

How the Optimizer Shapes Learned Solutions in Equivariant Neural Networks

Researchers demonstrate that the Muon optimizer significantly outperforms Adam when training equivariant neural networks, which encode geometric symmetries by design. Analysis of trained models reveals Muon produces solutions with more regular loss surfaces, higher weight ranks, and better-conditioned representations, suggesting optimizer choice substantially influences how neural networks learn geometric constraints.

AINeutralarXiv – CS AI · 6d ago6/10
🧠

Not All NVFP4 QAT Recipes Are Equal: How Architecture and Scale Shape Model Quality for Anomaly Segmentation

Researchers at arXiv demonstrate that model architecture significantly impacts how well neural networks handle FP4 quantization for medical image analysis. Swin Transformers maintain quality across different quantization recipes and scales, while CNNs degrade under certain conditions, establishing practical guidelines for deploying efficient anomaly segmentation models.

AINeutralarXiv – CS AI · 6d ago6/10
🧠

Learning Compositional Latent Structure with Vector Networks

Researchers introduce Vector Networks (VN), a neural architecture that replaces dense weight matrices with libraries of reusable rank-1 weight atoms, enabling selective composition of network components for novel tasks. The approach demonstrates significant out-of-distribution generalization improvements—up to an order of magnitude better than baselines—when familiar elements must be recombined in new ways, addressing a fundamental limitation in deep learning's ability to handle compositional reasoning.

AINeutralarXiv – CS AI · 6d ago6/10
🧠

Tackling Multimodal Learning Challenges with Mixture-of-Expert: A Survey

A comprehensive survey examines how Mixture-of-Experts (MoE) architectures address multimodal learning challenges by enabling scalable modeling, enriching representation learning across modalities, and adapting to imperfect data scenarios. The research identifies critical gaps in interpretable routing, expert communication, and lifelong multimodal learning, positioning MoE as a foundational framework for building more efficient and flexible AI systems.

AINeutralarXiv – CS AI · May 276/10
🧠

Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models

Researchers demonstrate that scale vectors in large language models, despite comprising negligible model parameters, significantly impact training performance and optimization. Through theoretical analysis and empirical validation across models from 0.12B to 2B parameters, the study proposes three complementary improvements to scale vector design that enhance training efficiency without adding computational overhead.

AIBullisharXiv – CS AI · May 276/10
🧠

Pair-In, Pair-Out: Latent Multi-Token Prediction for Efficient LLMs

Researchers propose PIPO (Pair-In, Pair-Out), a novel technique that combines input compression and multi-token prediction to accelerate large language model inference. The method eliminates expensive verification steps while achieving up to 2.64x speedups in first-token latency and demonstrating significant improvements on reasoning benchmarks.

AINeutralarXiv – CS AI · May 126/10
🧠

mHC-SSM: Manifold-Constrained Hyper-Connections for State Space Language Models with Stream-Specialized Adapters

Researchers introduce mHC-SSM, a novel architecture combining Manifold-Constrained Hyper-Connections with state space language models using stream-specialized adapters. The approach achieves significant perplexity improvements (572.91 to 461.88) on WikiText-2 benchmarks with predictable efficiency tradeoffs in throughput and memory usage.

🏢 Meta🏢 Perplexity
AINeutralarXiv – CS AI · May 126/10
🧠

CDS4RAG: Cyclic Dual-Sequential Hyperparameter Optimization for RAG

Researchers introduce CDS4RAG, a novel optimization framework that improves Retrieval-Augmented Generation systems by cyclically optimizing retriever and generator hyperparameters separately rather than treating them as a monolithic unit. The method achieves up to 1.54x improvements in generation quality while demonstrating faster convergence across multiple benchmarks and language models.

Page 1 of 2Next →