AIBearisharXiv – CS AI · 2d ago7/10
🧠Researchers challenge the assumption that memorization in text-to-image diffusion models can be localized to specific weights, demonstrating that pruning efforts can be bypassed through minor text embedding perturbations. The study reveals memorization is distributed throughout embedding space, suggesting current mitigation strategies are fundamentally fragile and requiring new approaches to protect training data privacy.
AIBullisharXiv – CS AI · May 127/10
🧠Researchers apply game-theoretic free energy principles to analyze attention head interactions in large language models, discovering that heads exhibit higher-order redundancy. Their framework enables principled pruning of low-contribution heads, achieving 18% FLOP reduction and 22% throughput improvement in GPT2 with minimal performance degradation.
🏢 Perplexity🧠 Llama
AIBullisharXiv – CS AI · May 127/10
🧠Researchers introduce HA-HeteroGNN, a Graph Neural Network framework that improves both interpretability and efficiency through hierarchical attention mechanisms and relevance-driven pruning. The approach achieves a 27% reduction in graph edges while improving classification accuracy by up to 2.46%, alongside 43.9% training time reductions.
AINeutralarXiv – CS AI · Mar 37/104
🧠Researchers analyzed compression effects on large reasoning models (LRMs) through quantization, distillation, and pruning methods. They found that dynamically quantized 2.51-bit models maintain near-original performance, while identifying critical weight components and showing that protecting just 2% of excessively compressed weights can improve accuracy by 6.57%.
AIBullisharXiv – CS AI · Mar 37/104
🧠Researchers introduce HEAPr, a novel pruning algorithm for Mixture-of-Experts (MoE) language models that decomposes experts into atomic components for more precise pruning. The method achieves nearly lossless compression at 20-25% pruning ratios while reducing computational costs by approximately 20%.
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers introduce Variance-Regularised Pruning (VR), a neural network pruning technique that reduces model size while maintaining robust performance across diverse users. The method balances computational efficiency with cross-participant stability in affective computing systems, achieving 80% sparsity without sacrificing reliability on the AGAIN emotion recognition dataset.
AIBullisharXiv – CS AI · 3d ago6/10
🧠Researchers present a method for aggressively pruning expert modules from mixture-of-experts large language models to create specialized translation systems. The approach removes up to 90% of experts with minimal performance degradation, demonstrating that translation tasks require only a fraction of a full LLM's parameters, enabling substantial model compression.
AIBullisharXiv – CS AI · Apr 76/10
🧠Researchers propose REAM (Router-weighted Expert Activation Merging), a new method for compressing large language models that groups and merges expert weights instead of pruning them. The technique preserves model performance better than existing pruning methods while reducing memory requirements for deployment.
AIBullisharXiv – CS AI · Mar 26/1015
🧠Researchers introduce FineScope, a framework that uses Sparse Autoencoder (SAE) techniques to create smaller, domain-specific language models from larger pretrained LLMs through structured pruning and self-data distillation. The method achieves competitive performance while significantly reducing computational requirements compared to training from scratch.
AINeutralarXiv – CS AI · Mar 34/107
🧠Researchers propose CA-AFP, a new federated learning framework that combines client clustering with adaptive model pruning to address both statistical and system heterogeneity challenges. The approach achieves better accuracy and fairness while reducing communication costs compared to existing methods, as demonstrated on human activity recognition benchmarks.