#sparse-models News & Analysis

13 articles tagged with #sparse-models. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

13 articles

AIBullisharXiv – CS AI · Jun 197/10

🧠

Lagrange: An Open-Vocabulary, Energy-Based Sparse Framework for Generalized End-to-End Driving

Researchers introduce Lagrange, an open-vocabulary autonomous driving framework that combines Vision-Language Models with sparse, energy-based planning to address limitations in existing end-to-end driving systems. The approach balances computational efficiency with generalization capacity for handling out-of-distribution scenarios while maintaining kinematic feasibility.

AIBullisharXiv – CS AI · Jun 107/10

🧠

SHAPE: Coalition-Aware Expert Pruning for Sparse Mixture-of-Experts LLMs

Researchers introduce SHAPE, a novel expert pruning framework for Sparse Mixture-of-Experts (MoE) language models that reduces memory requirements by up to 40% without retraining. Unlike traditional pruning methods that evaluate experts independently, SHAPE models expert cooperation using game theory, identifying which expert combinations matter most for model performance.

AIBullisharXiv – CS AI · Jun 47/10

🧠

L$^3$: Large Lookup Layers

Researchers introduce Large Lookup Layers (L³), a novel sparse architecture that generalizes embedding tables to decoder layers, enabling more efficient scaling than traditional Mixture-of-Experts models. The approach uses static token-based routing to aggregate learned embeddings contextually, achieving superior performance on language modeling tasks with up to 2.6B active parameters while maintaining hardware efficiency.

AIBullisharXiv – CS AI · Jun 27/10

🧠

BudgetDraft: Acceptance-Aware Multi-View Training for Sparse-KV Speculative Decoding

BudgetDraft is a new training method for sparse-KV speculative decoding that enables faster language model inference under memory constraints. By training drafters to handle multiple KV cache budgets simultaneously, the technique achieves up to 6.55x speedup on mid-to-long context inference while maintaining acceptance rates and reducing GPU memory usage.

AIBullisharXiv – CS AI · Jun 27/10

🧠

When Data Is Scarce: Scaling Sparse Language Models with Repeated Training

Researchers demonstrate that sparse neural networks can improve scaling efficiency in data-limited training scenarios, where models must train multiple epochs on repeated data. The study introduces a scaling law predicting performance across varying sparsity levels (up to 93.75%), finding that moderate sparsity around 50% optimizes loss while higher sparsity improves compute efficiency, challenging assumptions that sparsity is purely an efficiency tool.

AIBullisharXiv – CS AI · May 277/10

🧠

ReMoE: Boosting Expert Reuse through Router Fine-Tuning in Memory-Constrained MoE LLM Inference

Researchers introduce ReMoE, a router fine-tuning framework that optimizes Mixture-of-Experts language models for memory-constrained inference by increasing expert reuse and reducing storage I/O overhead. The approach improves expert reuse by 26% while maintaining performance, delivering up to 1.99× decode speedup on edge devices.

AINeutralarXiv – CS AI · Jun 106/10

🧠

Routing-Aware Expert Calibration for Machine Unlearning in Mixture-of-Experts Language Models

Researchers propose TRACE, a novel machine unlearning technique designed specifically for Mixture-of-Experts language models that addresses the problem of forget-critical experts receiving insufficient regularization during the unlearning process. The method achieves 9% relative utility improvements by detecting and calibrating expert activation patterns to match forget and retain data distributions, demonstrating consistent performance gains across multiple MoE architectures.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Beyond Task-Agnostic: Task-Aware Grouping for Communication-Efficient Multi-Task MoE Inference

Researchers propose Task-Aware Coactivation Grouping (TACG), a framework for optimizing Mixture-of-Experts (MoE) model inference across distributed GPUs by grouping experts based on task-specific activation patterns rather than global averages. The approach reduces communication costs by 31.39% while maintaining load balance, addressing a critical efficiency bottleneck in multi-task AI serving.

AIBullisharXiv – CS AI · May 286/10

🧠

FPMoE: A Sparse Mixture-of-Experts Approach to Functional Code Generation

Researchers introduce FPMoE, a sparse Mixture-of-Experts model optimized for functional programming languages like Haskell, OCaml, and Scala, addressing a significant gap in LLM-based code generation. With only 3B active parameters, the model matches the performance of much larger models while using a novel architecture combining language-specific experts with a shared expert for cross-language functional patterns.

AINeutralarXiv – CS AI · May 126/10

🧠

SDG-MoE: Signed Debate Graph Mixture-of-Experts

Researchers introduce SDG-MoE, a novel mixture-of-experts architecture that enables deliberation among routed experts through signed graph communication before output aggregation. The model demonstrates 19.8% perplexity improvement over vanilla MoE and achieves state-of-the-art results on multiple language modeling benchmarks while maintaining computational efficiency.

🏢 Perplexity

AINeutralarXiv – CS AI · May 116/10

🧠

Mixture of Masters: Sparse Chess Language Models with Player Routing

Researchers introduce Mixture-of-Masters (MoM), a sparse mixture-of-experts chess language model that routes moves through specialized GPT experts trained on individual grandmaster playing styles. The system outperforms dense transformer baselines and maintains interpretability by dynamically selecting which grandmaster persona to channel based on game state.

AIBullishOpenAI News · Nov 136/107

🧠

Understanding neural networks through sparse circuits

OpenAI is researching mechanistic interpretability through sparse neural network models to better understand AI reasoning processes. This approach aims to make AI systems more transparent and improve their safety and reliability.

AIBullisharXiv – CS AI · Mar 25/106

🧠

SDMixer: Sparse Dual-Mixer for Time Series Forecasting

Researchers have developed SDMixer, a new AI framework for multivariate time series forecasting that uses dual-stream sparse processing to analyze data in both frequency and time domains. The method employs sparsity mechanisms to filter noise and improve cross-variable dependency modeling, achieving leading performance on real-world datasets in transportation, energy, and finance applications.