98 articles tagged with #model-optimization. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.
AIBullisharXiv โ CS AI ยท Mar 127/10
๐ง Researchers developed a method using neural cellular automata (NCA) to generate synthetic data for pre-training language models, achieving up to 6% improvement in downstream performance with only 164M synthetic tokens. This approach outperformed traditional pre-training on 1.6B natural language tokens while being more computationally efficient and transferring well to reasoning benchmarks.
AIBullisharXiv โ CS AI ยท Mar 127/10
๐ง Researchers propose Mashup Learning, a method that leverages historical model checkpoints to improve AI training efficiency. The technique identifies relevant past training runs, merges them, and uses the result as initialization, achieving 0.5-5% accuracy improvements while reducing training time by up to 37%.
AIBullisharXiv โ CS AI ยท Mar 127/10
๐ง Researchers introduce Super Neurons (SNs), a new method that probes raw activations in Vision Language Models to improve classification performance while achieving up to 5.10x speedup. Unlike Sparse Attention Vectors, SNs can identify discriminative neurons in shallow layers, enabling extreme early exiting from the first layer at the first generated token.
AIBullisharXiv โ CS AI ยท Mar 97/10
๐ง Researchers introduce SpecEM, a new training-free framework for ensembling large language models that dynamically adjusts each model's contribution based on real-time performance. The system uses speculative decoding principles and online feedback mechanisms to improve collaboration between different LLMs, showing consistent performance improvements across multiple benchmark datasets.
AIBullisharXiv โ CS AI ยท Mar 57/10
๐ง Researchers developed COREA, a system that combines small and large language models to reduce AI reasoning costs by 21.5% while maintaining nearly identical accuracy. The system uses confidence scoring to decide when to escalate questions from cheaper small models to more expensive large models.
AIBullisharXiv โ CS AI ยท Mar 57/10
๐ง Researchers propose Supervised Calibration (SC), a new framework to improve In-Context Learning performance in Large Language Models by addressing systematic biases through optimal affine transformations in logit space. The method achieves state-of-the-art results across multiple LLMs including Mistral-7B, Llama-2-7B, and Qwen2-7B in few-shot learning scenarios.
๐ง Llama
AINeutralarXiv โ CS AI ยท Mar 56/10
๐ง Researchers reproduced and analyzed severe accuracy degradation in BERT transformer models when applying post-training quantization, showing validation accuracy drops from 89.66% to 54.33%. The study found that structured activation outliers intensify with model depth, with mixed precision quantization being the most effective mitigation strategy.
AIBullisharXiv โ CS AI ยท Mar 56/10
๐ง Researchers introduce Concentration-Alignment Transforms (CAT), a new method to reduce quantization error in large language and vision models by improving both weight/activation concentration and alignment. The technique consistently matches or outperforms existing quantization methods at 4-bit precision across several LLMs.
AIBullisharXiv โ CS AI ยท Mar 57/10
๐ง Researchers have developed Spectral Surgery, a training-free method to improve LoRA (Low-Rank Adaptation) model performance by reweighting singular values based on gradient sensitivity. The technique achieves significant performance gains (up to +4.4 points on CommonsenseQA) by adjusting only about 1,000 scalar coefficients without requiring retraining.
๐ง Llama
AIBullisharXiv โ CS AI ยท Mar 46/103
๐ง Researchers introduce SiNGER, a new knowledge distillation framework for Vision Transformers that suppresses harmful high-norm artifacts while preserving informative signals. The technique uses nullspace-guided perturbation and LoRA-based adapters to achieve state-of-the-art performance in downstream tasks.
AIBullisharXiv โ CS AI ยท Mar 47/103
๐ง Researchers developed a training method for large-scale Mixture-of-Experts (MoE) models using FP4 precision on Hopper GPUs without native 4-bit support. The technique achieves 14.8% memory reduction and 12.5% throughput improvement for 671B parameter models by using FP4 for activations while keeping core computations in FP8.
AIBullisharXiv โ CS AI ยท Mar 47/102
๐ง DiaBlo introduces a new Parameter-Efficient Fine-Tuning (PEFT) method that updates only diagonal blocks of weight matrices in large language models, offering better performance than LoRA while maintaining similar memory efficiency. The approach eliminates the need for low-rank matrix products and provides theoretical guarantees for convergence, showing competitive results across various AI tasks including reasoning and code generation.
AIBullisharXiv โ CS AI ยท Mar 47/103
๐ง Researchers have identified a critical flaw in reinforcement learning fine-tuning of large language models that causes degradation in multi-attempt performance despite improvements in single attempts. Their proposed solution, Diversity-Preserving Hybrid RL (DPH-RL), uses mass-covering f-divergences to maintain model diversity and prevent catastrophic forgetting while improving training efficiency.
AIBullisharXiv โ CS AI ยท Mar 37/105
๐ง Researchers developed HierarchicalPrune, a compression framework that reduces large-scale text-to-image diffusion models' memory footprint by 77.5-80.4% and latency by 27.9-38.0% while maintaining image quality. The technique enables billion-parameter AI models to run efficiently on resource-constrained devices through hierarchical pruning and knowledge distillation.
AINeutralarXiv โ CS AI ยท Mar 37/104
๐ง Researchers analyzed compression effects on large reasoning models (LRMs) through quantization, distillation, and pruning methods. They found that dynamically quantized 2.51-bit models maintain near-original performance, while identifying critical weight components and showing that protecting just 2% of excessively compressed weights can improve accuracy by 6.57%.
AIBullisharXiv โ CS AI ยท Feb 277/105
๐ง Researchers propose Metacognitive Behavioral Tuning (MBT), a new framework that addresses structural fragility in Large Reasoning Models by injecting human-like self-regulatory control into AI thought processes. The approach reduces reasoning collapse and improves accuracy while consuming fewer computational tokens across multi-hop question-answering benchmarks.
AIBullisharXiv โ CS AI ยท Feb 277/106
๐ง Researchers introduce Dual-Iterative Preference Optimization (Dual-IPO), a new method that iteratively improves both reward models and video generation models to create higher-quality AI-generated videos better aligned with human preferences. The approach enables smaller 2B parameter models to outperform larger 5B models without requiring manual preference annotations.
AIBullisharXiv โ CS AI ยท Feb 277/106
๐ง Researchers propose Supervised Reinforcement Learning (SRL), a new training framework that helps small-scale language models solve complex multi-step reasoning problems by generating internal reasoning monologues and providing step-wise rewards. SRL outperforms traditional Supervised Fine-Tuning and Reinforcement Learning approaches, enabling smaller models to tackle previously unlearnable problems.
AIBullisharXiv โ CS AI ยท Feb 277/105
๐ง Researchers developed a new approach to quantization-aware training (QAT) that optimizes compute allocation between full-precision and quantized training phases. They discovered that contrary to previous findings, the optimal ratio of QAT to full-precision training increases with total compute budget, and derived scaling laws to predict optimal configurations across different model sizes and bit widths.
AIBullishGoogle Research Blog ยท Aug 77/108
๐ง Research demonstrates a breakthrough method for achieving 10,000x reduction in training data requirements while maintaining high-fidelity labels in machine learning systems. This advancement focuses on human-computer interaction and visualization techniques to optimize data efficiency in AI training processes.
AINeutralarXiv โ CS AI ยท 1d ago6/10
๐ง Researchers propose CoDe-R, a two-stage framework using Large Language Models to improve binary decompilation by reducing logical errors and semantic misalignment. A 1.3B model using this approach achieves state-of-the-art performance on the HumanEval-Decompile benchmark, becoming the first lightweight model to exceed 50% re-executability rates.
AINeutralarXiv โ CS AI ยท 1d ago6/10
๐ง Researchers analyzed how LLM verifiers assess solution correctness in test-time scaling scenarios, revealing that verification effectiveness varies significantly with problem difficulty, generator strength, and verifier capability. The study demonstrates that weak generators can nearly match stronger ones post-verification and that verifier scaling alone cannot solve fundamental verification challenges.
๐ง GPT-4
AINeutralarXiv โ CS AI ยท 1d ago6/10
๐ง Researchers present a layer-wise analysis of Supervised Fine-Tuning (SFT) in large language models, revealing that middle layers remain stable during training while final layers exhibit high sensitivity. They introduce Mid-Block Efficient Tuning, a targeted approach that selectively updates intermediate layers and achieves up to 10.2% performance gains over standard LoRA on benchmarks with significantly reduced parameter overhead.
AINeutralarXiv โ CS AI ยท 2d ago6/10
๐ง Researchers demonstrate that inducing specific personas in Large Language Models produces measurable shifts in cognitive task performance, with effects showing 73.68% alignment to human personality-cognition relationships. The study introduces Dynamic Persona Routing, a lightweight strategy that optimizes LLM performance by dynamically selecting personas based on query type, outperforming static persona approaches without additional training.
AIBullisharXiv โ CS AI ยท 2d ago6/10
๐ง Researchers present Data Mixing Agent, an AI framework that uses reinforcement learning to automatically optimize how large language models balance training data from source and target domains during continual pre-training. The approach outperforms manual reweighting strategies while generalizing across different models, domains, and fields without requiring retraining.