y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#model-pruning News & Analysis

5 articles tagged with #model-pruning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

5 articles
AINeutralarXiv โ€“ CS AI ยท Mar 37/104
๐Ÿง 

When Reasoning Meets Compression: Understanding the Effects of LLMs Compression on Large Reasoning Models

Researchers analyzed compression effects on large reasoning models (LRMs) through quantization, distillation, and pruning methods. They found that dynamically quantized 2.51-bit models maintain near-original performance, while identifying critical weight components and showing that protecting just 2% of excessively compressed weights can improve accuracy by 6.57%.

AIBullisharXiv โ€“ CS AI ยท Mar 37/104
๐Ÿง 

HEAPr: Hessian-based Efficient Atomic Expert Pruning in Output Space

Researchers introduce HEAPr, a novel pruning algorithm for Mixture-of-Experts (MoE) language models that decomposes experts into atomic components for more precise pruning. The method achieves nearly lossless compression at 20-25% pruning ratios while reducing computational costs by approximately 20%.

AIBullisharXiv โ€“ CS AI ยท Apr 76/10
๐Ÿง 

REAM: Merging Improves Pruning of Experts in LLMs

Researchers propose REAM (Router-weighted Expert Activation Merging), a new method for compressing large language models that groups and merges expert weights instead of pruning them. The technique preserves model performance better than existing pruning methods while reducing memory requirements for deployment.

AIBullisharXiv โ€“ CS AI ยท Mar 26/1015
๐Ÿง 

FineScope : SAE-guided Data Selection Enables Domain Specific LLM Pruning and Finetuning

Researchers introduce FineScope, a framework that uses Sparse Autoencoder (SAE) techniques to create smaller, domain-specific language models from larger pretrained LLMs through structured pruning and self-data distillation. The method achieves competitive performance while significantly reducing computational requirements compared to training from scratch.

AINeutralarXiv โ€“ CS AI ยท Mar 34/107
๐Ÿง 

CA-AFP: Cluster-Aware Adaptive Federated Pruning

Researchers propose CA-AFP, a new federated learning framework that combines client clustering with adaptive model pruning to address both statistical and system heterogeneity challenges. The approach achieves better accuracy and fairness while reducing communication costs compared to existing methods, as demonstrated on human activity recognition benchmarks.