y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#neural-networks News & Analysis

358 articles tagged with #neural-networks. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

358 articles
AIBullisharXiv โ€“ CS AI ยท Mar 277/10
๐Ÿง 

Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation

Ming-Flash-Omni is a new 100 billion parameter multimodal AI model with Mixture-of-Experts architecture that uses only 6.1 billion active parameters per token. The model demonstrates unified capabilities across vision, speech, and language tasks, achieving performance comparable to Gemini 2.5 Pro on vision-language benchmarks.

๐Ÿง  Gemini
AINeutralarXiv โ€“ CS AI ยท Mar 267/10
๐Ÿง 

Evidence of an Emergent "Self" in Continual Robot Learning

Researchers propose a method to identify 'self-awareness' in AI systems by analyzing invariant cognitive structures that remain stable during continual learning. Their study found that robots subjected to continual learning developed significantly more stable subnetworks compared to control groups, suggesting this could be evidence of an emergent 'self' concept.

AIBullisharXiv โ€“ CS AI ยท Mar 267/10
๐Ÿง 

Moonwalk: Inverse-Forward Differentiation

Researchers introduce Moonwalk, a new algorithm that solves backpropagation's memory limitations by eliminating the need to store intermediate activations during neural network training. The method uses vector-inverse-Jacobian products and submersive networks to reconstruct gradients in a forward sweep, enabling training of networks more than twice as deep under the same memory constraints.

AIBearisharXiv โ€“ CS AI ยท Mar 177/10
๐Ÿง 

Cheating Stereo Matching in Full-scale: Physical Adversarial Attack against Binocular Depth Estimation in Autonomous Driving

Researchers have developed the first physical adversarial attack targeting stereo-based depth estimation in autonomous vehicles, using 3D camouflaged objects that can fool binocular vision systems. The attack employs global texture patterns and a novel merging technique to create nearly invisible threats that cause stereo matching models to produce incorrect depth information.

AINeutralarXiv โ€“ CS AI ยท Mar 177/10
๐Ÿง 

Distributional Semantics Tracing: A Framework for Explaining Hallucinations in Large Language Models

Researchers introduce Distributional Semantics Tracing (DST), a new framework for explaining hallucinations in large language models by tracking how semantic representations drift across neural network layers. The method reveals that hallucinations occur when models are pulled toward contextually inconsistent concepts based on training correlations rather than actual prompt context.

AIBullisharXiv โ€“ CS AI ยท Mar 177/10
๐Ÿง 

In-Context Symbolic Regression for Robustness-Improved Kolmogorov-Arnold Networks

Researchers developed new methods for extracting symbolic formulas from Kolmogorov-Arnold Networks (KANs), addressing a key bottleneck in making AI models more interpretable. The proposed Greedy in-context Symbolic Regression (GSR) and Gated Matching Pursuit (GMP) methods achieved up to 99.8% reduction in test error while improving robustness.

AIBullisharXiv โ€“ CS AI ยท Mar 177/10
๐Ÿง 

Why Inference in Large Models Becomes Decomposable After Training

Researchers have discovered that large AI models develop decomposable internal structures during training, with many parameter dependencies remaining statistically unchanged from initialization. They propose a post-training method to identify and remove unsupported dependencies, enabling parallel inference without modifying model functionality.

AIBullisharXiv โ€“ CS AI ยท Mar 177/10
๐Ÿง 

RESQ: A Unified Framework for REliability- and Security Enhancement of Quantized Deep Neural Networks

Researchers propose RESQ, a three-stage framework that enhances both security and reliability of quantized deep neural networks through specialized fine-tuning techniques. The framework demonstrates up to 10.35% improvement in attack resilience and 12.47% in fault resilience while maintaining competitive accuracy across multiple neural network architectures.

AINeutralarXiv โ€“ CS AI ยท Mar 177/10
๐Ÿง 

The Phenomenology of Hallucinations

Researchers discovered that AI language models hallucinate not from failing to detect uncertainty, but from inability to integrate uncertainty signals into output generation. The study shows models can identify uncertain inputs internally, but these signals become geometrically amplified yet functionally silent due to weak coupling with output layers.

AIBullisharXiv โ€“ CS AI ยท Mar 177/10
๐Ÿง 

Directional Routing in Transformers

Researchers introduce directional routing, a lightweight mechanism for transformer models that adds only 3.9% parameter cost but significantly improves performance. The technique gives attention heads learned suppression directions controlled by a shared router, reducing perplexity by 31-56% and becoming the dominant computational pathway in the model.

๐Ÿข Perplexity
AIBullisharXiv โ€“ CS AI ยท Mar 177/10
๐Ÿง 

OrthoFormer: Instrumental Variable Estimation in Transformer Hidden States via Neural Control Functions

Researchers propose OrthoFormer, a new Transformer architecture that addresses causal learning limitations by embedding instrumental variable estimation directly into neural networks. The framework aims to distinguish between spurious correlations and true causal mechanisms, potentially improving AI model robustness and reliability under distribution shifts.

AIBullisharXiv โ€“ CS AI ยท Mar 177/10
๐Ÿง 

SCAN: Sparse Circuit Anchor Interpretable Neuron for Lifelong Knowledge Editing

Researchers introduce SCAN, a new framework for editing Large Language Models that prevents catastrophic forgetting during sequential knowledge updates. The method uses sparse circuit manipulation instead of dense parameter changes, maintaining model performance even after 3,000 sequential edits across major models like Gemma2, Qwen3, and Llama3.1.

๐Ÿง  Llama
AIBullisharXiv โ€“ CS AI ยท Mar 177/10
๐Ÿง 

LESA: Learnable Stage-Aware Predictors for Diffusion Model Acceleration

Researchers propose LESA, a new framework that accelerates Diffusion Transformers (DiTs) by up to 6.25x using learnable predictors and Kolmogorov-Arnold Networks. The method achieves significant speedups while maintaining or improving generation quality in text-to-image and text-to-video synthesis tasks.

AIBullisharXiv โ€“ CS AI ยท Mar 177/10
๐Ÿง 

SPARQ: Spiking Early-Exit Neural Networks for Energy-Efficient Edge AI

SPARQ introduces a unified framework combining spiking neural networks, quantization-aware training, and reinforcement learning-guided early exits for energy-efficient edge AI. The system achieves up to 5.15% higher accuracy than conventional quantized SNNs while reducing system energy consumption by over 330 times and cutting synaptic operations by over 90%.

AIBullisharXiv โ€“ CS AI ยท Mar 167/10
๐Ÿง 

AI Model Modulation with Logits Redistribution

Researchers propose AIM, a novel AI model modulation paradigm that allows a single model to exhibit diverse behaviors without maintaining multiple specialized versions. The approach uses logits redistribution to enable dynamic control over output quality and input feature focus without requiring retraining or additional training data.

๐Ÿง  Llama
AIBullisharXiv โ€“ CS AI ยท Mar 167/10
๐Ÿง 

Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights

Researchers discovered that privacy vulnerabilities in neural networks exist in only a small fraction of weights, but these same weights are critical for model performance. They developed a new approach that preserves privacy by rewinding and fine-tuning only these critical weights instead of retraining entire networks, maintaining utility while defending against membership inference attacks.

AINeutralarXiv โ€“ CS AI ยท Mar 167/10
๐Ÿง 

HCP-DCNet: A Hierarchical Causal Primitive Dynamic Composition Network for Self-Improving Causal Understanding

Researchers introduce HCP-DCNet, a new AI framework that combines physical dynamics with symbolic causal reasoning to enable AI systems to understand cause-and-effect relationships. The system uses hierarchical causal primitives and can self-improve through interventions, potentially addressing current limitations in AI's ability to handle distribution shifts and counterfactual reasoning.

AINeutralarXiv โ€“ CS AI ยท Mar 167/10
๐Ÿง 

Superficial Safety Alignment Hypothesis

Researchers propose the Superficial Safety Alignment Hypothesis (SSAH), suggesting that AI safety alignment in large language models can be understood as a binary classification task of fulfilling or refusing user requests. The study identifies four types of critical components at the neuron level that establish safety guardrails, enabling models to retain safety attributes while adapting to new tasks.

AIBullisharXiv โ€“ CS AI ยท Mar 167/10
๐Ÿง 

A Geometrically-Grounded Drive for MDL-Based Optimization in Deep Learning

Researchers introduce a novel optimization framework that integrates the Minimum Description Length (MDL) principle directly into deep neural network training dynamics. The method uses geometrically-grounded cognitive manifolds with coupled Ricci flow to create autonomous model simplification while maintaining data fidelity, with theoretical guarantees for convergence and practical O(N log N) complexity.

AIBullisharXiv โ€“ CS AI ยท Mar 127/10
๐Ÿง 

HTMuon: Improving Muon via Heavy-Tailed Spectral Correction

Researchers have developed HTMuon, an improved optimization algorithm for training large language models that builds upon the existing Muon optimizer. HTMuon addresses limitations in Muon's weight spectra by incorporating heavy-tailed spectral corrections, showing up to 0.98 perplexity reduction in LLaMA pretraining experiments.

๐Ÿข Perplexity
AIBullisharXiv โ€“ CS AI ยท Mar 127/10
๐Ÿง 

Adaptive Activation Cancellation for Hallucination Mitigation in Large Language Models

Researchers developed Adaptive Activation Cancellation (AAC), a real-time framework that reduces hallucinations in large language models by identifying and suppressing problematic neural activations during inference. The method requires no fine-tuning or external knowledge and preserves model capabilities while improving factual accuracy across multiple model scales including LLaMA 3-8B.

๐Ÿข Perplexity
AIBullisharXiv โ€“ CS AI ยท Mar 127/10
๐Ÿง 

The Curse and Blessing of Mean Bias in FP4-Quantized LLM Training

Researchers have identified a simple solution to training instability in 4-bit quantized large language models by removing mean bias, which causes the dominant spectral anisotropy. This mean-subtraction technique substantially improves FP4 training performance while being hardware-efficient, potentially enabling more accessible low-bit LLM training.