98 articles tagged with #model-optimization. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv β CS AI Β· 1d ago7/10
π§ Researchers introduce Criticality-Aware Adversarial Training (CAAT), a parameter-efficient method that identifies and fine-tunes only the most robustness-critical parameters in Vision Transformers, achieving 94.3% of standard adversarial training robustness while tuning just 6% of model parameters. This breakthrough addresses the computational bottleneck preventing large-scale adversarial training deployment.
AIBullisharXiv β CS AI Β· 1d ago7/10
π§ Researchers present Chain-of-Models Pre-Training (CoM-PT), a novel method that accelerates vision foundation model training by up to 7.09X through sequential knowledge transfer from smaller to larger models in a unified pipeline, rather than training each model independently. The approach maintains or improves performance while significantly reducing computational costs, with efficiency gains increasing as more models are added to the training sequence.
AIBullisharXiv β CS AI Β· 2d ago7/10
π§ Researchers present MoEITS, a novel algorithm for simplifying Mixture-of-Experts large language models while maintaining performance and reducing computational costs. The method outperforms existing pruning techniques across multiple benchmark models including Mixtral 8Γ7B and DeepSeek-V2-Lite, addressing the energy and resource efficiency challenges of deploying advanced LLMs.
AINeutralarXiv β CS AI Β· 2d ago7/10
π§ Researchers demonstrate that integrating fairness metrics directly into AutoML optimization improves algorithmic fairness by 14.5% while reducing data usage by 35.7%, though at the cost of a 9.4% decrease in predictive accuracy. This study challenges the industry standard of prioritizing performance over fairness and shows that simpler, fairer ML models can achieve practical balance without requiring complex architectures.
π’ Meta
AIBullisharXiv β CS AI Β· 2d ago7/10
π§ Researchers demonstrate that Reinforcement Learning from Verifiable Rewards (RLVR) can train Large Language Models to negotiate effectively in incomplete-information games like price bargaining. A 30B parameter model trained with this method outperforms frontier models 10x its size and develops sophisticated persuasive strategies while generalizing to unseen negotiation scenarios.
AIBullisharXiv β CS AI Β· 2d ago7/10
π§ Researchers introduce Introspective Diffusion Language Models (I-DLM), a new approach that combines the parallel generation speed of diffusion models with the quality of autoregressive models by ensuring models verify their own outputs. I-DLM achieves performance matching conventional large language models while delivering 3x higher throughput, potentially reshaping how AI systems are deployed at scale.
AIBullisharXiv β CS AI Β· 2d ago7/10
π§ Researchers propose Min-k Sampling, a novel decoding strategy for large language models that dynamically identifies semantic cliffs in logit distributions to optimize token truncation. Unlike temperature-sensitive methods like Top-k and Top-p, Min-k achieves temperature invariance through relative logit dynamics while maintaining superior text quality across reasoning, creative writing, and human evaluation benchmarks.
AIBullisharXiv β CS AI Β· 3d ago7/10
π§ Researchers introduce the Two-Stage Decision-Sampling Hypothesis to explain how reinforcement learning enables self-reflection capabilities in large language models, demonstrating that RL's superior performance stems from improved decision-making rather than generation quality. The theory shows that reward gradients distribute asymmetrically across policy components, explaining why RL succeeds where supervised fine-tuning fails.
AIBearisharXiv β CS AI Β· 3d ago7/10
π§ Research demonstrates that layer pruningβa compression technique for large language modelsβeffectively reduces model size while maintaining classification performance, but critically fails to preserve generative reasoning capabilities like arithmetic and code generation. Even with extensive post-training on 400B tokens, models cannot recover lost reasoning abilities, revealing fundamental limitations in current compression approaches.
AI Γ CryptoBullishThe Register β AI Β· 4d ago7/10
π€A widening performance gap between proprietary enterprise AI models and open-source alternatives is reshaping the AI landscape, with open-weight models gaining prominence as organizations seek cost-effective and customizable solutions. This shift challenges the dominance of closed models and creates new opportunities for developers and businesses to leverage decentralized AI infrastructure.
AIBullisharXiv β CS AI Β· 6d ago7/10
π§ Researchers introduce Perception-Grounded Policy Optimization (PGPO), a novel fine-tuning framework that improves how large vision-language models learn from visual inputs by strategically allocating learning signals to vision-dependent tokens rather than treating all tokens equally. Testing on the Qwen2.5-VL series demonstrates an average 18.7% performance boost across multimodal reasoning benchmarks.
AIBullisharXiv β CS AI Β· Apr 77/10
π§ Researchers have developed a zero-shot quantization method that transfers robustness between AI models through weight-space arithmetic, improving post-training quantization performance by up to 60% without requiring additional training. This breakthrough enables low-cost deployment of extremely low-bit models by extracting 'quantization vectors' from donor models to patch receiver models.
AIBullisharXiv β CS AI Β· Apr 77/10
π§ Researchers propose SoLA, a training-free compression method for large language models that combines soft activation sparsity and low-rank decomposition. The method achieves significant compression while improving performance, demonstrating 30% compression on LLaMA-2-70B with reduced perplexity from 6.95 to 4.44 and 10% better downstream task accuracy.
π’ Perplexity
AIBullisharXiv β CS AI Β· Apr 77/10
π§ Researchers developed StableTTA, a training-free method that significantly improves AI model accuracy on ImageNet-1K, with 33 models achieving over 95% accuracy and several surpassing 96%. The method allows lightweight architectures to outperform Vision Transformers while using 95% fewer parameters and 89% less computational cost.
AIBullisharXiv β CS AI Β· Mar 277/10
π§ Researchers propose SWAA (Sliding Window Attention Adaptation), a toolkit that enables efficient long-context processing in large language models by adapting full attention models to sliding window attention without expensive retraining. The solution achieves 30-100% speedups for long context inference while maintaining acceptable performance quality through four core strategies that address training-inference mismatches.
AIBullishDecrypt β AI Β· Mar 177/10
π§ OpenAI has released GPT-5.4 Mini and Nano, smaller versions of their flagship model that offer faster performance and lower costs. These compact models are positioned as more practical solutions for everyday business and developer use cases compared to the full-sized GPT-5.4 model.
π’ OpenAIπ§ GPT-5
AINeutralarXiv β CS AI Β· Mar 177/10
π§ Researchers challenge the assumption of continuous AI progress, proposing that AI development follows punctuated equilibrium patterns with rapid phase transitions. They introduce the Institutional Scaling Law, proving that larger AI models don't always perform better in institutional environments due to trust, cost, and compliance factors.
AIBullisharXiv β CS AI Β· Mar 177/10
π§ Researchers propose LESA, a new framework that accelerates Diffusion Transformers (DiTs) by up to 6.25x using learnable predictors and Kolmogorov-Arnold Networks. The method achieves significant speedups while maintaining or improving generation quality in text-to-image and text-to-video synthesis tasks.
AINeutralarXiv β CS AI Β· Mar 177/10
π§ Researchers propose the Institutional Scaling Law, challenging the assumption that AI performance improves monotonically with model size. The framework shows that institutional fitness (capability, trust, affordability, sovereignty) has an optimal scale beyond which capability and trust diverge, suggesting orchestrated domain-specific models may outperform large generalist models.
AIBullisharXiv β CS AI Β· Mar 177/10
π§ Researchers have discovered that large AI models develop decomposable internal structures during training, with many parameter dependencies remaining statistically unchanged from initialization. They propose a post-training method to identify and remove unsupported dependencies, enabling parallel inference without modifying model functionality.
AIBullisharXiv β CS AI Β· Mar 177/10
π§ Researchers propose a new framework called On-Policy SFT that bridges the performance gap between supervised fine-tuning and reinforcement learning in AI model training. The framework introduces Distribution Discriminant Theory (DDT) and two techniques - In-Distribution Finetuning and Hinted Decoding - that achieve better generalization while maintaining computational efficiency.
AIBullisharXiv β CS AI Β· Mar 177/10
π§ Researchers propose ERC-SVD, a new compression method for large language models that uses error-controlled singular value decomposition to reduce model size while maintaining performance. The method addresses truncation loss and error propagation issues in existing SVD-based compression techniques by leveraging residual matrices and selectively compressing only the last few layers.
AIBullisharXiv β CS AI Β· Mar 177/10
π§ Researchers propose HO-SFL (Hybrid-Order Split Federated Learning), a new framework that enables memory-efficient fine-tuning of large AI models on edge devices by eliminating backpropagation on client devices while maintaining convergence speed comparable to traditional methods. The approach significantly reduces communication costs and memory requirements for distributed AI training.
AIBullisharXiv β CS AI Β· Mar 167/10
π§ Researchers propose AIM, a novel AI model modulation paradigm that allows a single model to exhibit diverse behaviors without maintaining multiple specialized versions. The approach uses logits redistribution to enable dynamic control over output quality and input feature focus without requiring retraining or additional training data.
π§ Llama
AIBullisharXiv β CS AI Β· Mar 167/10
π§ Researchers introduce improved methods for stitching Vision Foundation Models (VFMs) like CLIP and DINOv2, enabling integration of different models' strengths. The study proposes VFM Stitch Tree (VST) technique that allows controllable accuracy-latency trade-offs for multimodal applications.