y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#neural-architecture News & Analysis

46 articles tagged with #neural-architecture. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

46 articles
AIBullisharXiv – CS AI · 2d ago7/10
🧠

Rank-Factorized Implicit Neural Bias: Scaling Super-Resolution Transformer with FlashAttention

Researchers propose Rank-Factorized Implicit Neural Bias (RIB), a novel positional encoding method that replaces relative positional bias in Super-Resolution Transformers, enabling compatibility with FlashAttention hardware acceleration. This breakthrough achieves significant performance gains (35.63 dB PSNR on Urban100×2) while reducing training and inference time by 2.1× and 2.9× respectively, addressing a critical scalability bottleneck in SR model development.

AIBullisharXiv – CS AI · 2d ago7/10
🧠

DTop-p MoE: Sparsity-Controlled Dynamic Top-p MoE for Foundation Model Pre-training

Researchers introduce DTop-p, a dynamic routing mechanism for Mixture-of-Experts (MoE) architectures that adaptively selects experts based on token difficulty while maintaining controlled computational costs. The approach outperforms traditional Top-k routing and fixed Top-p methods by using a Proportional-Integral controller to dynamically adjust probability thresholds, demonstrating consistent improvements across large language models and diffusion transformers.

AIBullisharXiv – CS AI · 6d ago7/10
🧠

Tensor Memory: Fixed-Size Recurrent State for Long-Horizon Transformers

Researchers introduce Tensor Memory, a fixed-size recurrent module that augments Transformers with persistent 3D spatial state for improved long-sequence processing. The approach enables better video understanding and occlusion reasoning by decoupling memory capacity from input length while maintaining computational efficiency.

AIBullisharXiv – CS AI · May 277/10
🧠

Stabilizing Recurrent Dynamics for Test-Time Scalable Latent Reasoning in Looped Language Models

Researchers propose STARS, a training framework that stabilizes Looped Language Models (LoopLMs) to enable reliable test-time scaling through latent reasoning. The method uses Jacobian Spectral Radius Regularization to constrain neural states toward stable fixed points, addressing a critical problem where model performance peaks then collapses with increased recurrence depth.

AIBullisharXiv – CS AI · May 277/10
🧠

Message-Passing State-Space Models: Improving Graph Learning with Modern Sequence Modeling

Researchers introduce MP-SSM, a novel framework that integrates State-Space Model principles into message-passing neural networks for improved graph learning. The approach achieves permutation equivariance, computational efficiency, and long-range information propagation while enabling theoretical analysis of gradient flow and information dynamics across deep networks.

AIBullisharXiv – CS AI · May 127/10
🧠

Echo-LoRA: Parameter-Efficient Fine-Tuning via Cross-Layer Representation Injection

Echo-LoRA introduces a parameter-efficient fine-tuning method that injects cross-layer representations from deeper neural network layers into shallow LoRA modules during training, achieving 3-5.7% performance improvements on reasoning tasks without adding inference costs. The technique discards its auxiliary training path post-deployment, maintaining the efficiency benefits of standard LoRA while delivering measurable capability gains.

AIBullisharXiv – CS AI · May 117/10
🧠

Toeplitz MLP Mixers are Low Complexity, Information-Rich Sequence Models

Researchers introduce Toeplitz MLP Mixer (TMM), a transformer alternative that replaces attention mechanisms with triangular-masked Toeplitz matrix multiplication, achieving O(dn log n) training complexity and O(dn) inference complexity. TMMs demonstrate superior training efficiency, information retention, and in-context learning performance compared to existing sub-quadratic architectures.

AIBullisharXiv – CS AI · May 97/10
🧠

Normalized Architectures are Natively 4-Bit

Researchers demonstrate that nGPT, a neural architecture that normalizes weights and hidden representations to a unit hypersphere, achieves stable 4-bit precision training without requiring additional quantization interventions. The approach leverages mathematical properties of dot products to maintain stronger signal-to-noise ratios, enabling efficient training of models up to 30B parameters.

AIBullisharXiv – CS AI · Apr 147/10
🧠

Zero-shot World Models Are Developmentally Efficient Learners

Researchers introduce Zero-shot Visual World Models (ZWM), a computational framework inspired by how young children learn physical understanding from minimal data. The approach combines sparse prediction, causal inference, and compositional reasoning to achieve data-efficient learning, demonstrating that AI systems can match child development patterns while learning from single-child observational data.

AINeutralarXiv – CS AI · Apr 147/10
🧠

The Myth of Expert Specialization in MoEs: Why Routing Reflects Geometry, Not Necessarily Domain Expertise

Researchers demonstrate that Mixture of Experts (MoEs) specialization in large language models emerges from hidden state geometry rather than specialized routing architecture, challenging assumptions about how these systems work. Expert routing patterns resist human interpretation across models and tasks, suggesting that understanding MoE specialization remains as difficult as the broader unsolved problem of interpreting LLM internal representations.

AINeutralarXiv – CS AI · Apr 147/10
🧠

Universal statistical signatures of evolution in artificial intelligence architectures

A comprehensive study analyzing 935 ablation experiments from 161 publications reveals that artificial intelligence architectural evolution follows the same statistical laws as biological evolution, with a heavy-tailed distribution of fitness effects placing AI between viral genomes and simple organisms. The findings suggest that evolutionary statistical structure is substrate-independent and determined by fitness landscape topology rather than the underlying selection mechanism.

AIBullisharXiv – CS AI · Apr 147/10
🧠

Retrieval as Generation: A Unified Framework with Self-Triggered Information Planning

Researchers introduce GRIP, a unified framework that integrates retrieval decisions directly into language model generation through control tokens, eliminating the need for external retrieval controllers. The system enables models to autonomously decide when to retrieve information, reformulate queries, and terminate retrieval within a single autoregressive process, achieving competitive performance with GPT-4o while using substantially fewer parameters.

🧠 GPT-4
AIBullisharXiv – CS AI · Apr 107/10
🧠

Do We Need Distinct Representations for Every Speech Token? Unveiling and Exploiting Redundancy in Large Speech Language Models

Researchers demonstrate that large speech language models contain significant redundancy in their token representations, particularly in deeper layers. By introducing Affinity Pooling, a training-free token merging technique, they achieve 27.48% reduction in prefilling FLOPs and up to 1.7× memory savings while maintaining semantic accuracy, challenging the necessity of fully distinct tokens for acoustic processing.

AIBullisharXiv – CS AI · Mar 167/10
🧠

SRAM-Based Compute-in-Memory Accelerator for Linear-decay Spiking Neural Networks

Researchers developed an SRAM-based compute-in-memory accelerator for spiking neural networks that uses linear decay approximation instead of exponential decay, achieving 1.1x to 16.7x reduction in energy consumption. The innovation addresses the bottleneck of neuron state updates in neuromorphic computing by performing in-place decay directly within memory arrays.

AIBullisharXiv – CS AI · Mar 57/10
🧠

Architectural Proprioception in State Space Models: Thermodynamic Training Induces Anticipatory Halt Detection

Researchers introduce the Probability Navigation Architecture (PNA) framework that trains State Space Models with thermodynamic principles, discovering that SSMs develop 'architectural proprioception' - the ability to predict when to stop computation based on internal state entropy. This breakthrough shows SSMs can achieve computational self-awareness while Transformers cannot, with significant implications for efficient AI inference systems.

AIBullisharXiv – CS AI · Mar 37/104
🧠

Learning Internal Biological Neuron Parameters and Complexity-Based Encoding for Improved Spiking Neural Networks Performance

Researchers developed a novel learning approach for spiking neural networks that optimizes both synaptic weights and intrinsic neuronal parameters, achieving up to 13.50 percentage point improvements in classification accuracy. The study introduces a biologically-inspired SNN-LZC classifier that achieves 99.50% accuracy with sub-millisecond inference latency.

AINeutralarXiv – CS AI · 1d ago5/10
🧠

Enhancing BiGRU with a KAN Block for Legal Document Classification and Summarization

Researchers have developed a novel neural architecture combining Kolmogorov-Arnold Networks (KAN) with BiGRU models for classifying and summarizing legal documents in multilingual, low-resource settings. Tested on Bengali, English, and transliterated Bengali legal documents from Bangladesh, the hybrid model achieved 67.96% classification accuracy while demonstrating that KAN integration improved performance by over 10 percentage points.

AINeutralarXiv – CS AI · 1d ago6/10
🧠

You Can Learn Tokenization End-to-End with Reinforcement Learning

Researchers propose learning tokenization boundaries in large language models using reinforcement learning and score function estimates instead of hardcoded compression. This approach directly optimizes discrete token boundaries, outperforming prior straight-through estimation methods at the 100 million parameter scale.

AIBullisharXiv – CS AI · 1d ago6/10
🧠

Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses

Researchers introduce Harness-1, a 20B parameter search agent that separates semantic decision-making from state management by externalizing working memory to a stateful harness environment. The system achieves 73% average curated recall across eight retrieval benchmarks, outperforming comparable open-source searchers by 11.4 points while generalizing well to held-out transfer tasks.

AIBullisharXiv – CS AI · 1d ago6/10
🧠

Consistency Deep Equilibrium Models

Researchers introduce Consistency Deep Equilibrium Models (C-DEQ), a novel framework that accelerates inference in Deep Equilibrium Models by leveraging consistency distillation to achieve 2-20× accuracy improvements under few-step inference budgets. This advancement addresses a critical bottleneck in DEQs—their slow inference speed—while maintaining the memory efficiency that makes them attractive for deep learning applications.

AINeutralarXiv – CS AI · 2d ago6/10
🧠

Lumos-Nexus: Efficient Frequency Bridging with Homogeneous Latent Space for Video Unified Models

Lumos-Nexus is a new video generation framework that separates training and inference to improve both reasoning quality and visual fidelity. The system uses a lightweight generator during training and progressively hands off to a high-capacity generator during inference through a technique called Unified Progressive Frequency Bridging, while introducing VR-Bench as a benchmark for reasoning-driven video generation.

AINeutralarXiv – CS AI · 2d ago6/10
🧠

Unicorn: Scaling High-Dimensional Time Series Forecasting via Universal Correlation Modeling

Researchers introduce Unicorn, a universal correlation network that addresses a key limitation in time series forecasting by enabling models to scale across high-dimensional datasets while capturing inter-channel dependencies. The framework uses a latent prototype codebook to learn identity-agnostic patterns that transfer across diverse domains, significantly outperforming existing architectures in few-shot transfer scenarios.

AINeutralarXiv – CS AI · 2d ago6/10
🧠

CobSeg: Coherence Boundary Modeling for Dialogue Topic Segmentation

CobSeg introduces a novel multi-branch architecture for dialogue topic segmentation that separates semantic continuity from lexical boundary transitions, achieving significant performance improvements across five benchmarks without requiring LLM calls during inference. The approach demonstrates particular strength in scenarios where local lexical cues are prominent, reducing error metrics substantially in both supervised and pseudo-label settings.

AINeutralarXiv – CS AI · 2d ago6/10
🧠

Why Linear Recurrent Memory Works in Partially Observable Reinforcement Learning

Researchers provide theoretical foundations for why linear recurrent neural networks excel as memory units in partially observable reinforcement learning environments. The study demonstrates that linear filters can exactly reproduce belief vectors in hidden Markov models under deterministic conditions and nearly eliminate state ambiguity, offering mathematical justification for their empirical success.

AINeutralarXiv – CS AI · 5d ago6/10
🧠

The Cognitive Categorical Transformer: Category-Theoretic Inductive Biases for Language Modeling

Researchers introduce the Cognitive Categorical Transformer (CCT), a 306M-parameter language model that applies category-theoretic principles to improve upon GPT-2 Small, achieving 12% relative perplexity reduction on WikiText-103. The work provides empirical validation that simplicial message passing enhances language modeling performance and identifies a distinction between topology-adding versus consistency-enforcing categorical priors.

🏢 Perplexity
Page 1 of 2Next →