AIBullisharXiv – CS AI · 14h ago7/10
🧠Researchers introduce InfoQuant, a training-free method that optimizes activation distributions for low-bit quantization in large language models by using Peak Suppression Orthogonal Transformation. The technique achieves 97% accuracy preservation under W4A4KV4 quantization and reduces performance degradation by 42% compared to previous methods, advancing efficient LLM deployment.
AIBullisharXiv – CS AI · May 127/10
🧠MedThink presents a two-stage knowledge distillation framework that improves diagnostic accuracy in smaller language models by having teacher LLMs guide reasoning correction rather than simply transferring surface-level patterns. The approach achieves up to 12.7% improvement over baseline models while maintaining computational efficiency for resource-constrained clinical environments.
AIBullisharXiv – CS AI · May 117/10
🧠Researchers introduce SOD (Step-wise On-policy Distillation), a framework that improves small language models' ability to use tools and reason through complex tasks by adaptively controlling how much they learn from larger teacher models at each step. The approach achieves up to 20.86% improvement over existing methods and demonstrates that a 0.6B parameter model can reach 26.13% accuracy on AIME 2025, a significant benchmark for mathematical reasoning.
AIBullisharXiv – CS AI · May 77/10
🧠EdgeRazor introduces a lightweight quantization framework that compresses large language models to 1.88-bit precision while maintaining performance superior to existing 3-bit methods. The approach combines mixed-precision quantization with knowledge distillation and achieves up to 15.1× faster decoding with 80% storage reduction, requiring significantly lower computational training budgets than comparable techniques.
AIBullisharXiv – CS AI · Apr 107/10
🧠SpecQuant introduces a novel quantization framework using spectral decomposition to compress large language models to 4-bit precision for both weights and activations, achieving only 1.5% accuracy loss on LLaMA-3 8B while enabling 2x faster inference and 3x memory reduction. The technique exploits frequency domain properties to preserve essential signal components while suppressing high-frequency noise, addressing a critical challenge in deploying LLMs on edge devices.
AIBullisharXiv – CS AI · 14h ago6/10
🧠Researchers developed a specialized three-component pipeline for automated wind turbine blade inspection that combines object detection, spatial encoding, and a fine-tuned language model to generate structured maintenance reports. The system significantly outperforms general-purpose vision-language models, achieving 4% hallucination rate versus 65%, while running efficiently on edge hardware.
AINeutralarXiv – CS AI · May 126/10
🧠CardiacNAS presents an evolutionary neural architecture search framework that optimizes cardiac MRI segmentation models for both accuracy and computational efficiency. The approach achieves 93.22% dice similarity with only 3.58M parameters, demonstrating how resource-aware AI design can enable deployment of medical imaging models on resource-constrained environments.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers demonstrate that extreme quantization of large language models causes degradation beyond numerical precision loss, specifically through reduced smoothness in prediction spaces. They introduce smoothness-preserving techniques in post-training and quantization-aware training that improve generation quality independent of numerical accuracy gains.
AIBullisharXiv – CS AI · May 116/10
🧠Researchers present a 2.5-D decomposition method that improves LLM-based spatial reasoning for autonomous construction tasks by constraining language models to 2D horizontal planning while deterministic systems handle vertical placement. The approach achieves 94.6% structural accuracy on benchmark tests, significantly outperforming existing methods and demonstrating practical deployment on edge hardware.
🏢 Nvidia🧠 GPT-4
AINeutralarXiv – CS AI · Apr 146/10
🧠ReSpinQuant introduces an efficient quantization framework for large language models that combines the expressivity of layer-wise adaptation with the computational efficiency of global rotation methods. By leveraging offline activation rotation fusion and residual subspace rotation matching, the approach achieves state-of-the-art performance on aggressive quantization schemes (W4A4, W3A3) without significant inference overhead.
AINeutralarXiv – CS AI · Apr 136/10
🧠Researchers introduce CoA-LoRA, a method that dynamically adapts LoRA fine-tuning to different quantization configurations without requiring separate retraining for each setting. The approach uses a configuration-aware model and Pareto-based search to optimize low-rank adjustments across heterogeneous edge devices, achieving comparable performance to traditional methods with zero additional computational cost.
AIBullisharXiv – CS AI · Apr 106/10
🧠Researchers introduce EmoMAS, a Bayesian multi-agent framework that enables small language models to perform sophisticated negotiation by treating emotional intelligence as a strategic variable. The system coordinates game-theoretic, reinforcement learning, and psychological agents to optimize negotiation outcomes while maintaining privacy through edge deployment, demonstrating performance comparable to larger models across high-stakes domains.