#neural-architecture News & Analysis

78 articles tagged with #neural-architecture. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

78 articles

AINeutralarXiv – CS AI · Apr 147/10

🧠

The Myth of Expert Specialization in MoEs: Why Routing Reflects Geometry, Not Necessarily Domain Expertise

Researchers demonstrate that Mixture of Experts (MoEs) specialization in large language models emerges from hidden state geometry rather than specialized routing architecture, challenging assumptions about how these systems work. Expert routing patterns resist human interpretation across models and tasks, suggesting that understanding MoE specialization remains as difficult as the broader unsolved problem of interpreting LLM internal representations.

AIBullisharXiv – CS AI · Apr 107/10

🧠

Do We Need Distinct Representations for Every Speech Token? Unveiling and Exploiting Redundancy in Large Speech Language Models

Researchers demonstrate that large speech language models contain significant redundancy in their token representations, particularly in deeper layers. By introducing Affinity Pooling, a training-free token merging technique, they achieve 27.48% reduction in prefilling FLOPs and up to 1.7× memory savings while maintaining semantic accuracy, challenging the necessity of fully distinct tokens for acoustic processing.

AIBullisharXiv – CS AI · Mar 167/10

🧠

SRAM-Based Compute-in-Memory Accelerator for Linear-decay Spiking Neural Networks

Researchers developed an SRAM-based compute-in-memory accelerator for spiking neural networks that uses linear decay approximation instead of exponential decay, achieving 1.1x to 16.7x reduction in energy consumption. The innovation addresses the bottleneck of neuron state updates in neuromorphic computing by performing in-place decay directly within memory arrays.

AIBullisharXiv – CS AI · Mar 57/10

🧠

Architectural Proprioception in State Space Models: Thermodynamic Training Induces Anticipatory Halt Detection

Researchers introduce the Probability Navigation Architecture (PNA) framework that trains State Space Models with thermodynamic principles, discovering that SSMs develop 'architectural proprioception' - the ability to predict when to stop computation based on internal state entropy. This breakthrough shows SSMs can achieve computational self-awareness while Transformers cannot, with significant implications for efficient AI inference systems.

AIBullisharXiv – CS AI · Mar 37/104

🧠

Learning Internal Biological Neuron Parameters and Complexity-Based Encoding for Improved Spiking Neural Networks Performance

Researchers developed a novel learning approach for spiking neural networks that optimizes both synaptic weights and intrinsic neuronal parameters, achieving up to 13.50 percentage point improvements in classification accuracy. The study introduces a biologically-inspired SNN-LZC classifier that achieves 99.50% accuracy with sub-millisecond inference latency.

AIBullisharXiv – CS AI · Jun 256/10

🧠

Lightweight PCGAE-Net: Parallel CrossGate Attention and Bottleneck AutoEncoder for Efficient 5G Channel Prediction

Researchers introduce Lightweight PCGAE-Net, a new neural network architecture that reduces 5G channel prediction model size by 58% while improving accuracy by up to 6.0dB. The model addresses architectural inefficiencies in existing transformers through parallel attention mechanisms and a bottleneck autoencoder, enabling deployment on base-station hardware with computational constraints.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Evolutionary Optimization Reveals Structural Constraints on Reservoir Architecture for Spatiotemporal Chaos

Researchers used evolutionary algorithms to optimize reservoir computing architectures for predicting spatiotemporal chaos, discovering that evolution naturally converges on specific structural constraints rather than randomly improving networks. The findings reveal that task-driven optimization stabilizes particular dynamical classes and refines only the most prediction-relevant architectural features, providing insights into how biological systems adapt their information-processing networks.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Neuronal Stochastic Attention Circuit (NSAC) for Probabilistic Representation Learning

Researchers introduce NSAC, a biologically-inspired continuous-time attention architecture that quantifies uncertainty in representation learning by reformulating attention computation as a stochastic differential equation. The approach combines theoretical stability guarantees with practical applications across forecasting, autonomous vehicles, and industrial systems, advancing uncertainty quantification in neural networks.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Robust Auto-associative Memory via Convolutional Restricted Hopfield Networks

Researchers propose Convolutional Restricted Hopfield Networks (CRHNs), a new associative memory model that combines convolutional feature extraction with attractor-based retrieval to improve robustness against adversarial attacks and data corruption. Experiments demonstrate CRHNs achieve significantly lower reconstruction errors than existing models like Modern Hopfield Networks and Predictive Coding Networks, with improvements up to an order of magnitude under various perturbation conditions.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Comparing Transformers and Hybrid Models at the Token Level

Researchers comparing hybrid language models (mixing attention and recurrent layers) against pure transformers using Olmo weights find that hybrids excel at semantic state tracking but underperform on syntactic tasks like bracket matching. The analysis reveals that recurrent layers and attention mechanisms have complementary strengths, with gains concentrated in open-class words and semantic tasks rather than function words or n-gram prediction.

AIBullisharXiv – CS AI · Jun 236/10

🧠

Topological Neural Dynamics: A Neuron-wise Framework for Sequence Modeling

Researchers introduce Topological Neural Dynamics (TND), a novel sequence modeling framework that replaces traditional layer-wise neural computation with neuron-wise dynamics where individual neurons evolve independently through explicit graph topology. In a Pong behavior cloning benchmark, TND outperforms RNNs, LSTMs, continuous-time networks, and Transformers with a catch rate more than three times higher than the strongest baseline, suggesting this architectural approach offers a more effective inductive bias for sequence modeling.

AIBullisharXiv – CS AI · Jun 196/10

🧠

RoboSSM: Scalable In-context Imitation Learning via State-Space Models

Researchers introduce RoboSSM, a new in-context imitation learning framework that replaces Transformers with state-space models (SSMs) for robotic task learning. The approach demonstrates superior performance on long-context prompts and achieves better generalization to unseen tasks compared to Transformer-based methods, establishing SSMs as a viable alternative backbone for robot learning systems.

AINeutralarXiv – CS AI · Jun 116/10

🧠

Redesign Mixture-of-Experts Routers with Manifold Power Iteration

Researchers propose Manifold Power Iteration (MPI), a novel router redesign method for Mixture-of-Experts models that aligns router rows with principal singular directions of associated experts. The approach uses a "Power-then-Retract" paradigm and demonstrates improved MoE model effectiveness across scales from 1B to 11B parameters.

AINeutralarXiv – CS AI · Jun 116/10

🧠

RoVE: Rotary Value Embeddings Attention for Relative Position-dependent Value Pathways

Researchers introduce RoVE (Rotary Value Embeddings), a parameter-free modification to Rotary Position Embeddings (RoPE) that makes value tokens position-sensitive in attention mechanisms. Testing on GPT-2 models demonstrates consistent improvements in few-shot learning, out-of-distribution performance, and long-context retrieval tasks.

🏢 Perplexity

AINeutralarXiv – CS AI · Jun 116/10

🧠

Information-Theoretic Decomposition for Multimodal Interaction Learning

Researchers introduce DMIL (Decomposition-based Multimodal Interaction Learning), a novel framework that systematically analyzes and learns from dynamic, sample-specific interactions across multiple data modalities. The approach addresses fundamental limitations in existing multimodal learning paradigms by explicitly modeling redundant, unique, and synergistic information components, demonstrating consistent performance improvements across diverse tasks.

AINeutralarXiv – CS AI · Jun 106/10

🧠

Routing-Aware Expert Calibration for Machine Unlearning in Mixture-of-Experts Language Models

Researchers propose TRACE, a novel machine unlearning technique designed specifically for Mixture-of-Experts language models that addresses the problem of forget-critical experts receiving insufficient regularization during the unlearning process. The method achieves 9% relative utility improvements by detecting and calibrating expert activation patterns to match forget and retain data distributions, demonstrating consistent performance gains across multiple MoE architectures.

AIBullisharXiv – CS AI · Jun 96/10

🧠

Why Limit the Residual Stream to Layers and Not Tokens? Persistent Memory for Continuous Latent Reasoning

Researchers propose AGCLR, a new method that enhances large language models' reasoning capabilities by introducing persistent memory across reasoning steps. The approach addresses a fundamental limitation in continuous latent reasoning where intermediate facts are lost as models explore deeper reasoning paths, demonstrating consistent improvements on mathematical and multi-hop reasoning benchmarks.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Structure-Conditioned Actor-Critic Branches for Quality-Diversity Reinforcement Learning

Researchers introduce SV-QD-RL, a reinforcement learning framework that generates diverse policy repertoires by conditioning actor networks on learned structural masks and pairing them with branch-specific critics. The approach demonstrates improved performance on continuous control tasks while maintaining behavioral diversity through structure-aware archive management.

AIBullisharXiv – CS AI · Jun 96/10

🧠

Late-Layer Fusion is Enough: Dual-Path Vision Token Routing for Multimodal Large Language Models under Visual Saturation

Researchers propose Dual-Path Vision Token Routing (DPVR), a framework that optimizes multimodal large language models by routing vision tokens away from deep transformer layers where they saturate early, instead fusing visual and textual information only in the final layer. The approach reduces computational overhead by 3% while maintaining competitive performance, challenging the assumption that vision tokens must traverse all deep language-model layers.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Bidirectional Small-Granularity Search between Code and Text

Researchers introduce a bidirectional search task linking code snippets with text descriptions and vice versa, addressing the gap between scientific publications and their implementations. They present a large dataset with automatically-generated training data and manually-annotated test sets, along with a modular encoder-based approach that achieves strong in-domain results with promising out-of-domain generalization.

🧠 GPT-4

AIBullisharXiv – CS AI · Jun 96/10

🧠

Deep Tree Tensor Networks

Researchers introduce Deep Tree Tensor Networks (DTTN), a novel neural architecture originating from quantum physics that captures exponential-order feature interactions for image recognition. The model demonstrates superior performance across multiple benchmarks while maintaining parameter efficiency through tree-like topology, potentially advancing interpretable AI research.

AINeutralarXiv – CS AI · Jun 86/10

🧠

Limitations of Normalization in Attention Mechanism

Researchers present a theoretical and empirical analysis of softmax normalization limitations in attention mechanisms, demonstrating that as token selection increases, models lose their ability to distinguish important tokens and converge toward uniform selection patterns. The findings highlight gradient sensitivity challenges during training and suggest that improved normalization strategies are needed for more effective attention architectures.

AINeutralarXiv – CS AI · Jun 45/10

🧠

RowNet: A Memory Transformer for Tabular Regression

RowNet is a neural architecture that improves real estate price prediction by using memory-based retrieval to identify comparable properties rather than treating each property in isolation. The model combines similarity matching, attention mechanisms, and mixture-of-experts to outperform traditional multilayer perceptrons and gradient-boosted decision trees on tabular regression tasks.

AIBullisharXiv – CS AI · Jun 26/10

🧠

Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses

Researchers introduce Harness-1, a 20B parameter search agent that separates semantic decision-making from state management by externalizing working memory to a stateful harness environment. The system achieves 73% average curated recall across eight retrieval benchmarks, outperforming comparable open-source searchers by 11.4 points while generalizing well to held-out transfer tasks.

AINeutralarXiv – CS AI · Jun 25/10

🧠

Enhancing BiGRU with a KAN Block for Legal Document Classification and Summarization

Researchers have developed a novel neural architecture combining Kolmogorov-Arnold Networks (KAN) with BiGRU models for classifying and summarizing legal documents in multilingual, low-resource settings. Tested on Bengali, English, and transliterated Bengali legal documents from Bangladesh, the hybrid model achieved 67.96% classification accuracy while demonstrating that KAN integration improved performance by over 10 percentage points.

← PrevPage 2 of 4Next →