#neural-architecture News & Analysis

78 articles tagged with #neural-architecture. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

78 articles

AIBullisharXiv – CS AI · Jun 257/10

🧠

ATMA: Length-Invariant Language Modeling via Polar Attention and Gated-Delta Compression Memory

Researchers introduce ATMA, a novel hybrid attention architecture that solves the long-context problem in language models by combining polar attention with gated-delta compression memory. The system maintains 90%+ retrieval accuracy at 64K tokens (32x training length) while improving perplexity monotonically, addressing fundamental limitations of softmax attention that degrades with longer sequences.

🏢 Perplexity

AIBullisharXiv – CS AI · Jun 257/10

🧠

ACT-JEPA: Novel Joint-Embedding Predictive Architecture for Efficient Policy Representation Learning

Researchers introduce ACT-JEPA, a machine learning architecture that combines imitation learning with self-supervised learning to improve policy representation in AI decision-making systems. The model achieves up to 40% improvement in world model understanding and 10% higher task success rates by jointly predicting action and latent observation sequences in latent space rather than raw input.

AIBullisharXiv – CS AI · Jun 257/10

🧠

CauScale: Neural Causal Discovery at Scale

CauScale is a neural architecture that dramatically advances causal discovery—a critical capability for scientific AI and data analysis—by enabling efficient processing of graphs with up to 1,000 nodes. The system achieves 99.6% accuracy on standard benchmarks while delivering 4-13,000x faster inference than existing methods, solving long-standing computational bottlenecks that previously limited causal discovery to smaller datasets.

AIBullisharXiv – CS AI · Jun 237/10

🧠

Keyless Attention: Value-Space Routing and Value-Only Caching for Efficient Transformers

Researchers propose Keyless Attention, a transformer mechanism that eliminates key projections to reduce KV cache memory by 50% while maintaining or improving performance across multiple model architectures. The approach introduces a value-space routing matrix that replaces the traditional key projection, demonstrating competitive results on perplexity and downstream benchmarks.

🏢 Perplexity🧠 Llama

AINeutralarXiv – CS AI · Jun 117/10

🧠

From Architecture to Output: Structural Origins of Hallucination in Large Language Models and the Amplifying Role of Data

Researchers identify three core architectural mechanisms in large language models that systematically produce hallucinations: self-attention's statistical confusion of entities, maximum likelihood training that rewards plausible-sounding falsehoods, and autoregressive decoding that cascades errors forward. Dataset quality issues amplify rather than originate these failures, suggesting that fixing hallucinations requires architectural redesign, not just better training data.

AIBullisharXiv – CS AI · Jun 117/10

🧠

From Prompts to Tokens: Internalizing Causal Supervision in Vision-Language Model for Multi-Image Causal Reasoning

Researchers introduce BridgeVLM, a vision-language model that internalizes causal reasoning by converting visual inputs into structured causal tokens processed through specialized neural layers, achieving significant improvements in multi-image intervention and counterfactual reasoning tasks compared to prompt-based approaches.

AIBullisharXiv – CS AI · Jun 107/10

🧠

CLP: Collocation-Length Prediction for Zero-Loss Adaptive Multi-Token Inference

Researchers propose CLP (Collocation-Length Predictor), a lightweight neural architecture that improves multi-token prediction inference for large language models by eliminating competition between prediction heads and backbone models. The method achieves 1.20x-1.29x speedup on smaller models with zero quality degradation, significantly outperforming existing approaches that suffer from repetitive outputs.

AIBullisharXiv – CS AI · Jun 107/10

🧠

ActiveMem: Distributed Active Memory for Long-Horizon LLM Reasoning

Researchers introduce ActiveMem, a distributed memory framework that decouples storage from reasoning in large language models, enabling agents to handle longer tasks without context overload. The system separates executive planning from memory management—inspired by human brain architecture—and demonstrates state-of-the-art performance on complex reasoning benchmarks while reducing computational overhead.

AIBullisharXiv – CS AI · Jun 97/10

🧠

CT-VAM: A Cerebello-Thalamic-Inspired Vision-Action Model for Efficient Visuomotor Control

Researchers introduce CT-VAM, a compact 68M-parameter neural network inspired by cerebellar-thalamic brain architecture for robotic manipulation tasks. The model processes visual inputs and proprioception to predict action sequences efficiently on edge devices, matching larger vision-language-action models while reducing latency and enabling practical deployment on resource-constrained robots.

AIBullisharXiv – CS AI · Jun 87/10

🧠

Don't Pause: Streaming Video-Language Synchrony for Online Video Understanding

Researchers introduce LyraV, a streaming video-language model that maintains real-time synchronization between video perception and language generation without pausing. The system uses a hierarchical control framework with two key components—a Frame-Driven Transition Controller and Streaming Token Pacer—to interleave video frames with generated tokens at 3.89 FPS with 98.29% synchrony.

AIBullisharXiv – CS AI · Jun 57/10

🧠

Exact Linear Attention

Researchers introduce Exact Linear Attention (ELA), a novel Transformer mechanism that achieves linear computational complexity while eliminating approximation errors in attention calculations. The approach demonstrates significant practical improvements including 6x faster decoding speeds and 75% reduction in KV cache memory, with extensions to vision models showing 4.3x GPU speedup.

AIBullisharXiv – CS AI · Jun 47/10

🧠

L$^3$: Large Lookup Layers

Researchers introduce Large Lookup Layers (L³), a novel sparse architecture that generalizes embedding tables to decoder layers, enabling more efficient scaling than traditional Mixture-of-Experts models. The approach uses static token-based routing to aggregate learned embeddings contextually, achieving superior performance on language modeling tasks with up to 2.6B active parameters while maintaining hardware efficiency.

AIBullisharXiv – CS AI · Jun 47/10

🧠

Platonic Transformers: A Solid Choice For Equivariance

Researchers introduce Platonic Transformers, a novel architecture that adds geometric symmetry constraints to standard Transformers without sacrificing computational efficiency. By leveraging symmetry groups from Platonic solids as reference frames for attention mechanisms, the model achieves equivariance to translations and discrete symmetries while maintaining Transformer performance across vision, 3D point clouds, and molecular prediction tasks.

AIBullisharXiv – CS AI · Jun 47/10

🧠

Building The Ph(ysical)AI Layer Of Machine Intelligence

Researchers propose principle-driven foundation models that encode physics-based principles rather than learn statistical correlations, achieving cross-modal transfer from radio-frequency data to audio, images, text, and video without fine-tuning. A 1.99M parameter frozen encoder reaches 77.7% average accuracy across 15 tasks, with performance varying systematically between physically-grounded (84.5%) and semantic tasks (70.0%), suggesting complementary approaches to AI generalization.

AIBullisharXiv – CS AI · Jun 17/10

🧠

Rank-Factorized Implicit Neural Bias: Scaling Super-Resolution Transformer with FlashAttention

Researchers propose Rank-Factorized Implicit Neural Bias (RIB), a novel positional encoding method that replaces relative positional bias in Super-Resolution Transformers, enabling compatibility with FlashAttention hardware acceleration. This breakthrough achieves significant performance gains (35.63 dB PSNR on Urban100×2) while reducing training and inference time by 2.1× and 2.9× respectively, addressing a critical scalability bottleneck in SR model development.

AIBullisharXiv – CS AI · Jun 17/10

🧠

DTop-p MoE: Sparsity-Controlled Dynamic Top-p MoE for Foundation Model Pre-training

Researchers introduce DTop-p, a dynamic routing mechanism for Mixture-of-Experts (MoE) architectures that adaptively selects experts based on token difficulty while maintaining controlled computational costs. The approach outperforms traditional Top-k routing and fixed Top-p methods by using a Proportional-Integral controller to dynamically adjust probability thresholds, demonstrating consistent improvements across large language models and diffusion transformers.

AIBullisharXiv – CS AI · May 287/10

🧠

Tensor Memory: Fixed-Size Recurrent State for Long-Horizon Transformers

Researchers introduce Tensor Memory, a fixed-size recurrent module that augments Transformers with persistent 3D spatial state for improved long-sequence processing. The approach enables better video understanding and occlusion reasoning by decoupling memory capacity from input length while maintaining computational efficiency.

AIBullisharXiv – CS AI · May 277/10

🧠

Message-Passing State-Space Models: Improving Graph Learning with Modern Sequence Modeling

Researchers introduce MP-SSM, a novel framework that integrates State-Space Model principles into message-passing neural networks for improved graph learning. The approach achieves permutation equivariance, computational efficiency, and long-range information propagation while enabling theoretical analysis of gradient flow and information dynamics across deep networks.

AIBullisharXiv – CS AI · May 277/10

🧠

Stabilizing Recurrent Dynamics for Test-Time Scalable Latent Reasoning in Looped Language Models

Researchers propose STARS, a training framework that stabilizes Looped Language Models (LoopLMs) to enable reliable test-time scaling through latent reasoning. The method uses Jacobian Spectral Radius Regularization to constrain neural states toward stable fixed points, addressing a critical problem where model performance peaks then collapses with increased recurrence depth.

AIBullisharXiv – CS AI · May 127/10

🧠

Echo-LoRA: Parameter-Efficient Fine-Tuning via Cross-Layer Representation Injection

Echo-LoRA introduces a parameter-efficient fine-tuning method that injects cross-layer representations from deeper neural network layers into shallow LoRA modules during training, achieving 3-5.7% performance improvements on reasoning tasks without adding inference costs. The technique discards its auxiliary training path post-deployment, maintaining the efficiency benefits of standard LoRA while delivering measurable capability gains.

AIBullisharXiv – CS AI · May 117/10

🧠

Toeplitz MLP Mixers are Low Complexity, Information-Rich Sequence Models

Researchers introduce Toeplitz MLP Mixer (TMM), a transformer alternative that replaces attention mechanisms with triangular-masked Toeplitz matrix multiplication, achieving O(dn log n) training complexity and O(dn) inference complexity. TMMs demonstrate superior training efficiency, information retention, and in-context learning performance compared to existing sub-quadratic architectures.

AIBullisharXiv – CS AI · May 97/10

🧠

Normalized Architectures are Natively 4-Bit

Researchers demonstrate that nGPT, a neural architecture that normalizes weights and hidden representations to a unit hypersphere, achieves stable 4-bit precision training without requiring additional quantization interventions. The approach leverages mathematical properties of dot products to maintain stronger signal-to-noise ratios, enabling efficient training of models up to 30B parameters.

AINeutralarXiv – CS AI · Apr 147/10

🧠

The Myth of Expert Specialization in MoEs: Why Routing Reflects Geometry, Not Necessarily Domain Expertise

Researchers demonstrate that Mixture of Experts (MoEs) specialization in large language models emerges from hidden state geometry rather than specialized routing architecture, challenging assumptions about how these systems work. Expert routing patterns resist human interpretation across models and tasks, suggesting that understanding MoE specialization remains as difficult as the broader unsolved problem of interpreting LLM internal representations.

AIBullisharXiv – CS AI · Apr 147/10

🧠

Retrieval as Generation: A Unified Framework with Self-Triggered Information Planning

Researchers introduce GRIP, a unified framework that integrates retrieval decisions directly into language model generation through control tokens, eliminating the need for external retrieval controllers. The system enables models to autonomously decide when to retrieve information, reformulate queries, and terminate retrieval within a single autoregressive process, achieving competitive performance with GPT-4o while using substantially fewer parameters.

🧠 GPT-4

AIBullisharXiv – CS AI · Apr 147/10

🧠

Zero-shot World Models Are Developmentally Efficient Learners

Researchers introduce Zero-shot Visual World Models (ZWM), a computational framework inspired by how young children learn physical understanding from minimal data. The approach combines sparse prediction, causal inference, and compositional reasoning to achieve data-efficient learning, demonstrating that AI systems can match child development patterns while learning from single-child observational data.

Page 1 of 4Next →