AINeutralarXiv – CS AI · 5d ago6/10
🧠Researchers have developed Tail-Aware HiFloat4, a post-training quantization method that compresses text-to-video generation models using W4A4 (4-bit weights and activations) while maintaining output quality. The technique introduces activation-tail-aware calibration to handle statistical outliers, enabling efficient model deployment without retraining.
AIBullisharXiv – CS AI · 5d ago6/10
🧠Researchers propose PushCen-ADFL, a new framework for asynchronous decentralized federated learning that reduces communication overhead by over 80% while improving accuracy under data heterogeneity. The approach uses centroid-based message compression and bias-correction aggregation to enable stable model training across distributed systems without central coordination.
AIBullisharXiv – CS AI · 5d ago6/10
🧠Researchers introduce Dense2MoE, a framework that converts dense language models into efficient Mixture of Experts (MoE) architectures through unified pruning and upcycling, enabling viable on-device LLM deployment with improved latency-accuracy tradeoffs.
AINeutralDecrypt – AI · 5d ago6/10
🧠OpenBMB has released a 1-billion-parameter AI model optimized for on-device execution on smartphones, featuring Model Context Protocol (MCP) support and agentic tool use capabilities. While the model enables local AI agents without cloud dependency, it demonstrates limitations in handling complex logical reasoning tasks.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers introduce DARE, a technique that reduces computational redundancy in Diffusion Language Models by reusing cached attention activations across tokens. The method achieves up to 1.20x per-layer latency improvements while maintaining generation quality, addressing efficiency gaps between diffusion-based and auto-regressive language models.
AIBullisharXiv – CS AI · May 126/10
🧠Researchers introduce CA-DSSL, a new self-supervised learning technique that enables efficient AI model training on microcontrollers with under 500K parameters. The method surpasses existing approaches by 18 percentage points on standard benchmarks while requiring significantly fewer parameters, achieving 94% of supervised learning performance with models deployable in just 378 KB of memory.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers propose Compressed Video Aggregator (CVA), a lightweight module that improves micro-video recommendation systems by decoupling video processing from preference learning. The method reduces training time and GPU memory by orders of magnitude while maintaining or improving performance through intelligent frame selection based on video titles.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers demonstrate that extreme quantization of large language models causes degradation beyond numerical precision loss, specifically through reduced smoothness in prediction spaces. They introduce smoothness-preserving techniques in post-training and quantization-aware training that improve generation quality independent of numerical accuracy gains.
AIBullisharXiv – CS AI · May 126/10
🧠Researchers introduce COAST, a novel pruning framework for vision-language models that reduces visual tokens by 77.8% while maintaining 98.64% performance and achieving 2.15x speedup. Unlike existing methods that discard low-attention tokens, COAST uses adaptive semantic routing to preserve contextually essential information, preventing 'Visual Aphasia'—a failure mode where models lose visual grounding.
AIBullisharXiv – CS AI · May 126/10
🧠Researchers have developed a knowledge distillation framework that compresses a 7B 3D vision-language model into a 2.29B student model, achieving 8.7x faster inference while retaining 54-72% performance. The approach introduces "Hidden CoT," learnable latent tokens that enable spatial reasoning without explicit chain-of-thought training data, making 3D scene understanding feasible on resource-constrained devices.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers introduce Amortized-Precision Quantization (APQ) and MAQEE, a framework that optimizes Vision Transformers for low-precision deployment with early-exit mechanisms. By jointly optimizing exit thresholds and bit-widths while accounting for quantization noise across layers, the approach achieves up to 95% reduction in computational operations while maintaining accuracy across vision tasks.
AINeutralarXiv – CS AI · May 116/10
🧠TopoPrune introduces a topology-based framework for data pruning that addresses instability issues in geometric methods by leveraging intrinsic data structure rather than extrinsic geometry. The approach combines manifold approximation with persistent homology to achieve high accuracy at extreme pruning rates (90%) while maintaining robustness across architectures and noise conditions.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers demonstrate that KV-cache offloading techniques, designed to reduce memory usage in large language models, significantly degrade performance on context-intensive tasks requiring extensive information extraction. The study introduces the Text2JSON benchmark and identifies low-rank projection and unreliable landmarks as key failure points, proposing improved alternatives.
🧠 Llama
AINeutralarXiv – CS AI · May 116/10
🧠A comprehensive academic survey examines edge deep learning—the integration of deep learning with edge computing—and its applications in computer vision and medical diagnostics. The paper categorizes hardware platforms, reviews model optimization techniques like compression and lightweight design, and identifies future challenges for deploying neural networks on resource-constrained devices.
AINeutralarXiv – CS AI · May 96/10
🧠Researchers demonstrate that On-Policy Self-Distillation (OPSD) functions primarily as a compression mechanism rather than a correction tool for thinking-enabled mathematical reasoning models. They propose a revised training pipeline (SFT → RLVR → OPSD) that leverages OPSD's strengths in shortening responses while preserving accuracy on correct outputs.
AINeutralarXiv – CS AI · May 96/10
🧠Researchers propose using evolutionary strategies to fine-tune quantized deep learning models, improving accuracy beyond standard nearest-neighbor quantization techniques. The approach selectively adjusts weight values across iterations to find better quantization states, demonstrating effectiveness on VGG, ResNet, and autoencoder architectures for image classification and detection tasks.
AINeutralarXiv – CS AI · May 96/10
🧠Researchers propose a novel knowledge distillation method for multi-modal AI systems that transfers modality relationship information from teacher to student networks by learning the teacher's Gram Matrix. This approach goes beyond existing methods that only focus on final output, enabling deeper knowledge transfer across different data modalities.
AINeutralarXiv – CS AI · May 96/10
🧠Researchers have identified three fundamental dynamical principles—mutual alignment, unlocking, and racing—that explain how gradient descent training reduces neural network capacity to match task requirements. This theoretical advancement clarifies the mechanisms behind the lottery ticket hypothesis and why certain initial neuron conditions lead to higher weight norms, bridging a significant gap between empirical neural network success and theoretical understanding.
AINeutralarXiv – CS AI · May 76/10
🧠Researchers introduce Budgeted LoRA, a distillation framework that compresses large language models by treating model compression as a structured compute allocation problem. The method achieves up to 4.05x speedup in inference through selective dense component removal and adaptive low-rank allocation, controlled by a single compute budget parameter.
🏢 Perplexity
AINeutralarXiv – CS AI · May 46/10
🧠Researchers demonstrate that quantization—reducing AI model precision to improve efficiency—paradoxically increases energy consumption and degrades reasoning accuracy in multi-hop reasoning tasks, contradicting established neural scaling laws. The study identifies hardware dequantization overhead as a critical bottleneck and proposes a Critical Model Scale metric to predict when quantization becomes counterproductive across different model sizes and hardware configurations.
AIBullisharXiv – CS AI · May 16/10
🧠BoostLoRA introduces a gradient-boosting framework that enables parameter-efficient fine-tuning adapters to grow their effective rank iteratively, allowing ultra-low-parameter models to match or exceed full fine-tuning performance across mathematical reasoning, code generation, and protein classification tasks. The method merges adapters with zero inference overhead while maintaining minimal per-round parameter costs.
AIBearisharXiv – CS AI · May 16/10
🧠Researchers challenge the conventional wisdom that large language models contain significant redundant parameters, demonstrating that small-magnitude weights encode crucial knowledge for difficult downstream tasks. The study reveals that pruning these weights causes irreversible performance degradation that cannot be recovered through continued training, with effects monotonically correlated to task difficulty.
AINeutralarXiv – CS AI · Apr 206/10
🧠Researchers introduce Self-Distillation Fine-Tuning (SDFT), a framework that recovers performance degradation in Large Language Models caused by compression, quantization, and catastrophic forgetting. Using Centered Kernel Alignment analysis, the study demonstrates that self-distillation works by aligning the student model's high-dimensional manifold with the teacher model's optimal representation structure.
AINeutralarXiv – CS AI · Apr 146/10
🧠ReSpinQuant introduces an efficient quantization framework for large language models that combines the expressivity of layer-wise adaptation with the computational efficiency of global rotation methods. By leveraging offline activation rotation fusion and residual subspace rotation matching, the approach achieves state-of-the-art performance on aggressive quantization schemes (W4A4, W3A3) without significant inference overhead.
AIBullisharXiv – CS AI · Apr 136/10
🧠Researchers demonstrate that HiFloat4, a 4-bit floating-point format, enables efficient large language model training on Huawei's Ascend NPUs with up to 4x improvements in compute throughput and memory efficiency. The study shows that specialized stabilization techniques can maintain accuracy within 1% of full-precision baselines while preserving computational gains across dense and mixture-of-experts architectures.