y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#model-optimization News & Analysis

98 articles tagged with #model-optimization. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

98 articles
AIBullisharXiv – CS AI Β· 1d ago7/10
🧠

Efficient Adversarial Training via Criticality-Aware Fine-Tuning

Researchers introduce Criticality-Aware Adversarial Training (CAAT), a parameter-efficient method that identifies and fine-tunes only the most robustness-critical parameters in Vision Transformers, achieving 94.3% of standard adversarial training robustness while tuning just 6% of model parameters. This breakthrough addresses the computational bottleneck preventing large-scale adversarial training deployment.

AIBullisharXiv – CS AI Β· 1d ago7/10
🧠

Chain-of-Models Pre-Training: Rethinking Training Acceleration of Vision Foundation Models

Researchers present Chain-of-Models Pre-Training (CoM-PT), a novel method that accelerates vision foundation model training by up to 7.09X through sequential knowledge transfer from smaller to larger models in a unified pipeline, rather than training each model independently. The approach maintains or improves performance while significantly reducing computational costs, with efficiency gains increasing as more models are added to the training sequence.

AIBullisharXiv – CS AI Β· 2d ago7/10
🧠

MoEITS: A Green AI approach for simplifying MoE-LLMs

Researchers present MoEITS, a novel algorithm for simplifying Mixture-of-Experts large language models while maintaining performance and reducing computational costs. The method outperforms existing pruning techniques across multiple benchmark models including Mixtral 8Γ—7B and DeepSeek-V2-Lite, addressing the energy and resource efficiency challenges of deploying advanced LLMs.

AINeutralarXiv – CS AI Β· 2d ago7/10
🧠

Exploring the impact of fairness-aware criteria in AutoML

Researchers demonstrate that integrating fairness metrics directly into AutoML optimization improves algorithmic fairness by 14.5% while reducing data usage by 35.7%, though at the cost of a 9.4% decrease in predictive accuracy. This study challenges the industry standard of prioritizing performance over fairness and shows that simpler, fairer ML models can achieve practical balance without requiring complex architectures.

🏒 Meta
AIBullisharXiv – CS AI Β· 2d ago7/10
🧠

Instructing LLMs to Negotiate using Reinforcement Learning with Verifiable Rewards

Researchers demonstrate that Reinforcement Learning from Verifiable Rewards (RLVR) can train Large Language Models to negotiate effectively in incomplete-information games like price bargaining. A 30B parameter model trained with this method outperforms frontier models 10x its size and develops sophisticated persuasive strategies while generalizing to unseen negotiation scenarios.

AIBullisharXiv – CS AI Β· 2d ago7/10
🧠

Introspective Diffusion Language Models

Researchers introduce Introspective Diffusion Language Models (I-DLM), a new approach that combines the parallel generation speed of diffusion models with the quality of autoregressive models by ensuring models verify their own outputs. I-DLM achieves performance matching conventional large language models while delivering 3x higher throughput, potentially reshaping how AI systems are deployed at scale.

AIBullisharXiv – CS AI Β· 2d ago7/10
🧠

Min-$k$ Sampling: Decoupling Truncation from Temperature Scaling via Relative Logit Dynamics

Researchers propose Min-k Sampling, a novel decoding strategy for large language models that dynamically identifies semantic cliffs in logit distributions to optimize token truncation. Unlike temperature-sensitive methods like Top-k and Top-p, Min-k achieves temperature invariance through relative logit dynamics while maintaining superior text quality across reasoning, creative writing, and human evaluation benchmarks.

AIBullisharXiv – CS AI Β· 3d ago7/10
🧠

The Two-Stage Decision-Sampling Hypothesis: Understanding the Emergence of Self-Reflection in RL-Trained LLMs

Researchers introduce the Two-Stage Decision-Sampling Hypothesis to explain how reinforcement learning enables self-reflection capabilities in large language models, demonstrating that RL's superior performance stems from improved decision-making rather than generation quality. The theory shows that reward gradients distribute asymmetrically across policy components, explaining why RL succeeds where supervised fine-tuning fails.

AIBearisharXiv – CS AI Β· 3d ago7/10
🧠

On the Limits of Layer Pruning for Generative Reasoning in Large Language Models

Research demonstrates that layer pruningβ€”a compression technique for large language modelsβ€”effectively reduces model size while maintaining classification performance, but critically fails to preserve generative reasoning capabilities like arithmetic and code generation. Even with extensive post-training on 400B tokens, models cannot recover lost reasoning abilities, revealing fundamental limitations in current compression approaches.

AI Γ— CryptoBullishThe Register – AI Β· 4d ago7/10
πŸ€–

Growing void between enterprise and frontier AI puts open weights models in the spotlight

A widening performance gap between proprietary enterprise AI models and open-source alternatives is reshaping the AI landscape, with open-weight models gaining prominence as organizations seek cost-effective and customizable solutions. This shift challenges the dominance of closed models and creates new opportunities for developers and businesses to leverage decentralized AI infrastructure.

AIBullisharXiv – CS AI Β· 6d ago7/10
🧠

Not All Tokens See Equally: Perception-Grounded Policy Optimization for Large Vision-Language Models

Researchers introduce Perception-Grounded Policy Optimization (PGPO), a novel fine-tuning framework that improves how large vision-language models learn from visual inputs by strategically allocating learning signals to vision-dependent tokens rather than treating all tokens equally. Testing on the Qwen2.5-VL series demonstrates an average 18.7% performance boost across multimodal reasoning benchmarks.

AIBullisharXiv – CS AI Β· Apr 77/10
🧠

Zero-Shot Quantization via Weight-Space Arithmetic

Researchers have developed a zero-shot quantization method that transfers robustness between AI models through weight-space arithmetic, improving post-training quantization performance by up to 60% without requiring additional training. This breakthrough enables low-cost deployment of extremely low-bit models by extracting 'quantization vectors' from donor models to patch receiver models.

AIBullisharXiv – CS AI Β· Apr 77/10
🧠

SoLA: Leveraging Soft Activation Sparsity and Low-Rank Decomposition for Large Language Model Compression

Researchers propose SoLA, a training-free compression method for large language models that combines soft activation sparsity and low-rank decomposition. The method achieves significant compression while improving performance, demonstrating 30% compression on LLaMA-2-70B with reduced perplexity from 6.95 to 4.44 and 10% better downstream task accuracy.

🏒 Perplexity
AIBullisharXiv – CS AI Β· Apr 77/10
🧠

StableTTA: Training-Free Test-Time Adaptation that Improves Model Accuracy on ImageNet1K to 96%

Researchers developed StableTTA, a training-free method that significantly improves AI model accuracy on ImageNet-1K, with 33 models achieving over 95% accuracy and several surpassing 96%. The method allows lightweight architectures to outperform Vision Transformers while using 95% fewer parameters and 89% less computational cost.

AIBullisharXiv – CS AI Β· Mar 277/10
🧠

SWAA: Sliding Window Attention Adaptation for Efficient and Quality Preserving Long Context Processing

Researchers propose SWAA (Sliding Window Attention Adaptation), a toolkit that enables efficient long-context processing in large language models by adapting full attention models to sliding window attention without expensive retraining. The solution achieves 30-100% speedups for long context inference while maintaining acceptable performance quality through four core strategies that address training-inference mismatches.

AIBullishDecrypt – AI Β· Mar 177/10
🧠

OpenAI Releases GPT-5.4 Mini and Nano, Which Could Be More Useful Than the Big Model

OpenAI has released GPT-5.4 Mini and Nano, smaller versions of their flagship model that offer faster performance and lower costs. These compact models are positioned as more practical solutions for everyday business and developer use cases compared to the full-sized GPT-5.4 model.

OpenAI Releases GPT-5.4 Mini and Nano, Which Could Be More Useful Than the Big Model
🏒 OpenAI🧠 GPT-5
AINeutralarXiv – CS AI Β· Mar 177/10
🧠

Punctuated Equilibria in Artificial Intelligence: The Institutional Scaling Law and the Speciation of Sovereign AI

Researchers challenge the assumption of continuous AI progress, proposing that AI development follows punctuated equilibrium patterns with rapid phase transitions. They introduce the Institutional Scaling Law, proving that larger AI models don't always perform better in institutional environments due to trust, cost, and compliance factors.

AIBullisharXiv – CS AI Β· Mar 177/10
🧠

LESA: Learnable Stage-Aware Predictors for Diffusion Model Acceleration

Researchers propose LESA, a new framework that accelerates Diffusion Transformers (DiTs) by up to 6.25x using learnable predictors and Kolmogorov-Arnold Networks. The method achieves significant speedups while maintaining or improving generation quality in text-to-image and text-to-video synthesis tasks.

AINeutralarXiv – CS AI Β· Mar 177/10
🧠

The Institutional Scaling Law: Non-Monotonic Fitness, Capability-Trust Divergence, and Symbiogenetic Scaling in Generative AI

Researchers propose the Institutional Scaling Law, challenging the assumption that AI performance improves monotonically with model size. The framework shows that institutional fitness (capability, trust, affordability, sovereignty) has an optimal scale beyond which capability and trust diverge, suggesting orchestrated domain-specific models may outperform large generalist models.

AIBullisharXiv – CS AI Β· Mar 177/10
🧠

Why Inference in Large Models Becomes Decomposable After Training

Researchers have discovered that large AI models develop decomposable internal structures during training, with many parameter dependencies remaining statistically unchanged from initialization. They propose a post-training method to identify and remove unsupported dependencies, enabling parallel inference without modifying model functionality.

AIBullisharXiv – CS AI Β· Mar 177/10
🧠

Towards On-Policy SFT: Distribution Discriminant Theory and its Applications in LLM Training

Researchers propose a new framework called On-Policy SFT that bridges the performance gap between supervised fine-tuning and reinforcement learning in AI model training. The framework introduces Distribution Discriminant Theory (DDT) and two techniques - In-Distribution Finetuning and Hinted Decoding - that achieve better generalization while maintaining computational efficiency.

AIBullisharXiv – CS AI Β· Mar 177/10
🧠

ERC-SVD: Error-Controlled SVD for Large Language Model Compression

Researchers propose ERC-SVD, a new compression method for large language models that uses error-controlled singular value decomposition to reduce model size while maintaining performance. The method addresses truncation loss and error propagation issues in existing SVD-based compression techniques by leveraging residual matrices and selectively compressing only the last few layers.

AIBullisharXiv – CS AI Β· Mar 177/10
🧠

HO-SFL: Hybrid-Order Split Federated Learning with Backprop-Free Clients and Dimension-Free Aggregation

Researchers propose HO-SFL (Hybrid-Order Split Federated Learning), a new framework that enables memory-efficient fine-tuning of large AI models on edge devices by eliminating backpropagation on client devices while maintaining convergence speed comparable to traditional methods. The approach significantly reduces communication costs and memory requirements for distributed AI training.

AIBullisharXiv – CS AI Β· Mar 167/10
🧠

AI Model Modulation with Logits Redistribution

Researchers propose AIM, a novel AI model modulation paradigm that allows a single model to exhibit diverse behaviors without maintaining multiple specialized versions. The approach uses logits redistribution to enable dynamic control over output quality and input feature focus without requiring retraining or additional training data.

🧠 Llama
AIBullisharXiv – CS AI Β· Mar 167/10
🧠

Revisiting Model Stitching In the Foundation Model Era

Researchers introduce improved methods for stitching Vision Foundation Models (VFMs) like CLIP and DINOv2, enabling integration of different models' strengths. The study proposes VFM Stitch Tree (VST) technique that allows controllable accuracy-latency trade-offs for multimodal applications.

Page 1 of 4Next β†’