y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#model-optimization News & Analysis

98 articles tagged with #model-optimization. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

98 articles
AINeutralarXiv – CS AI Β· 2d ago6/10
🧠

Why Do Multilingual Reasoning Gaps Emerge in Reasoning Language Models?

Researchers identify that reasoning language models exhibit worse performance in low-resource languages due to failures in language understanding rather than reasoning capability itself. The study proposes Selective Translation, which strategically adds English translations only when understanding failures are detected, achieving near full-translation performance while translating just 20% of inputs.

AINeutralarXiv – CS AI Β· 2d ago6/10
🧠

Agent^2 RL-Bench: Can LLM Agents Engineer Agentic RL Post-Training?

Researchers introduce Agent^2 RL-Bench, a benchmark testing whether LLM agents can autonomously design and execute reinforcement learning pipelines to improve foundation models. Testing across multiple agent systems reveals significant performance variation, with online RL succeeding primarily on ALFWorld while supervised learning pipelines dominate under fixed computational budgets.

AIBullisharXiv – CS AI Β· 3d ago6/10
🧠

E3-TIR: Enhanced Experience Exploitation for Tool-Integrated Reasoning

Researchers introduce E3-TIR, a new training paradigm for Large Language Models that improves tool-use reasoning by combining expert guidance with self-exploration. The method achieves 6% performance gains while using less than 10% of typical synthetic data, addressing key limitations in current reinforcement learning approaches for AI agents.

AINeutralarXiv – CS AI Β· 6d ago6/10
🧠

AgentGate: A Lightweight Structured Routing Engine for the Internet of Agents

AgentGate introduces a lightweight routing engine that optimizes how AI agents communicate and dispatch tasks across distributed systems by treating routing as a constrained decision problem rather than open-ended text generation. The system uses a two-stage approachβ€”action decision and structural groundingβ€”and demonstrates that compact 3B-7B parameter models can achieve competitive performance while operating under resource constraints, latency, and privacy limitations.

AINeutralarXiv – CS AI Β· 6d ago6/10
🧠

A Comparative Study of Demonstration Selection for Practical Large Language Models-based Next POI Prediction

Researchers conducted a comparative analysis of demonstration selection strategies for using large language models to predict users' next point-of-interest (POI) based on historical location data. The study found that simple heuristic methods like geographical proximity and temporal ordering outperform complex embedding-based approaches in both computational efficiency and prediction accuracy, with LLMs using these heuristics sometimes matching fine-tuned model performance without additional training.

AIBullisharXiv – CS AI Β· Apr 76/10
🧠

MUXQ: Mixed-to-Uniform Precision MatriX Quantization via Low-Rank Outlier Decomposition

Researchers propose MUXQ, a new quantization technique for large language models that addresses activation outliers through low-rank decomposition. The method enables efficient INT8 quantization while maintaining accuracy close to FP16, making it suitable for edge device deployment with NPU-based hardware.

🏒 Perplexity
AIBullisharXiv – CS AI Β· Apr 66/10
🧠

Haiku to Opus in Just 10 bits: LLMs Unlock Massive Compression Gains

Researchers developed new compression techniques for LLM-generated text, achieving massive compression ratios through domain-adapted LoRA adapters and an interactive 'Question-Asking' protocol. The QA method uses binary questions to transfer knowledge between small and large models, achieving compression ratios of 0.0006-0.004 while recovering 23-72% of capability gaps.

AIBullisharXiv – CS AI Β· Apr 66/10
🧠

Efficient3D: A Unified Framework for Adaptive and Debiased Token Reduction in 3D MLLMs

Researchers have developed Efficient3D, a framework that accelerates 3D Multimodal Large Language Models (MLLMs) while maintaining accuracy through adaptive token pruning. The system uses a Debiased Visual Token Importance Estimator and Adaptive Token Rebalancing to reduce computational overhead without sacrificing performance, showing +2.57% CIDEr improvement on benchmarks.

AIBullisharXiv – CS AI Β· Apr 66/10
🧠

Rubrics to Tokens: Bridging Response-level Rubrics and Token-level Rewards in Instruction Following Tasks

Researchers propose Rubrics to Tokens (RTT), a novel reinforcement learning framework that improves Large Language Model alignment by bridging response-level and token-level rewards. The method addresses reward sparsity and ambiguity issues in instruction-following tasks through fine-grained credit assignment and demonstrates superior performance across different models.

AIBullisharXiv – CS AI Β· Mar 266/10
🧠

Towards Effective Experiential Learning: Dual Guidance for Utilization and Internalization

Researchers propose Dual Guidance Optimization (DGO), a new framework that improves large language model training by combining external experience banks with internal knowledge to better mimic human learning patterns. The approach shows consistent improvements over existing reinforcement learning methods for reasoning tasks.

AIBullisharXiv – CS AI Β· Mar 266/10
🧠

Enhancing Efficiency and Performance in Deepfake Audio Detection through Neuron-level Dropin & Neuroplasticity Mechanisms

Researchers developed novel 'dropin' and 'plasticity' algorithms inspired by brain neuroplasticity to improve deepfake audio detection efficiency. The methods dynamically adjust neuron counts in model layers, achieving up to 66% reduction in error rates while improving computational efficiency across multiple architectures including ResNet and Wav2Vec.

AINeutralarXiv – CS AI Β· Mar 176/10
🧠

MESD: Detecting and Mitigating Procedural Bias in Intersectional Groups

Researchers propose MESD (Multi-category Explanation Stability Disparity), a new metric to detect procedural bias in AI models across intersectional groups. They also introduce UEF framework that balances utility, explanation quality, and fairness in machine learning systems.

AIBullisharXiv – CS AI Β· Mar 176/10
🧠

Resolving Interference (RI): Disentangling Models for Improved Model Merging

Researchers have developed Resolving Interference (RI), a new framework that improves AI model merging by reducing cross-task interference when combining specialized models. The method makes models functionally orthogonal to other tasks using only unlabeled data, improving merging performance by up to 3.8% and generalization by up to 2.3%.

AIBullisharXiv – CS AI Β· Mar 176/10
🧠

Mitigating Overthinking in Large Reasoning Language Models via Reasoning Path Deviation Monitoring

Researchers propose a new early-exit method for Large Reasoning Language Models that detects and prevents overthinking by monitoring high-entropy transition tokens that indicate deviation from correct reasoning paths. The method improves performance and efficiency compared to existing approaches without requiring additional training overhead or limiting inference throughput.

AINeutralarXiv – CS AI Β· Mar 176/10
🧠

Compute Allocation for Reasoning-Intensive Retrieval Agents

Researchers studied computational resource allocation in AI retrieval systems for long-horizon agents, finding that re-ranking stages benefit more from powerful models and deeper candidate pools than query expansion stages. The study suggests concentrating compute power on re-ranking rather than distributing it uniformly across pipeline stages for better performance.

🧠 Gemini
AIBullisharXiv – CS AI Β· Mar 126/10
🧠

One Model, Many Skills: Parameter-Efficient Fine-Tuning for Multitask Code Analysis

Researchers conducted the first comprehensive evaluation of parameter-efficient fine-tuning (PEFT) for multi-task code analysis, showing that a single PEFT module can match full fine-tuning performance while reducing computational costs by up to 85%. The study found that even 1B-parameter models with multi-task PEFT outperform large general-purpose LLMs like DeepSeek and CodeLlama on code analysis tasks.

AINeutralarXiv – CS AI Β· Mar 96/10
🧠

When Rubrics Fail: Error Enumeration as Reward in Reference-Free RL Post-Training for Virtual Try-On

Researchers propose Implicit Error Counting (IEC), a new reinforcement learning approach for training AI models in domains where multiple valid outputs exist and traditional rubric-based evaluation fails. The method focuses on counting what responses get wrong rather than what they get right, with validation shown in virtual try-on applications where it outperforms existing rubric-based methods.

AIBullisharXiv – CS AI Β· Mar 96/10
🧠

Energy-Driven Adaptive Visual Token Pruning for Efficient Vision-Language Models

Researchers developed E-AdaPrune, an energy-driven adaptive pruning framework that optimizes Vision-Language Models by dynamically allocating visual tokens based on image information density. The method shows up to 0.6% average improvement across benchmarks, with a notable 5.1% boost on reasoning tasks, while adding only 8ms latency per image.

AINeutralarXiv – CS AI Β· Mar 55/10
🧠

Local Shapley: Model-Induced Locality and Optimal Reuse in Data Valuation

Researchers propose Local Shapley, a new method that dramatically reduces computational complexity in data valuation by focusing only on training data points that actually influence specific predictions. The approach achieves substantial speedups while maintaining accuracy by leveraging model-induced locality properties.

AIBullisharXiv – CS AI Β· Mar 36/106
🧠

Stepwise Penalization for Length-Efficient Chain-of-Thought Reasoning

Researchers developed SWAP (Step-wise Adaptive Penalization), a new AI training method that makes large reasoning models more efficient by reducing unnecessary steps in chain-of-thought reasoning. The technique reduces reasoning length by 64.3% while improving accuracy by 5.7%, addressing the costly problem of AI models 'overthinking' during problem-solving.

AIBullisharXiv – CS AI Β· Mar 37/107
🧠

What Do Visual Tokens Really Encode? Uncovering Sparsity and Redundancy in Multimodal Large Language Models

Researchers developed EmbedLens, a tool to analyze how multimodal large language models process visual information, finding that only 60% of visual tokens carry meaningful image-specific information. The study reveals significant inefficiencies in current MLLM architectures and proposes optimizations through selective token pruning and mid-layer injection.

AIBullisharXiv – CS AI Β· Mar 36/107
🧠

Curvature-Weighted Capacity Allocation: A Minimum Description Length Framework for Layer-Adaptive Large Language Model Optimization

Researchers developed a new mathematical framework called Curvature-Weighted Capacity Allocation that optimizes large language model performance by identifying which layers contribute most to loss reduction. The method uses the Minimum Description Length principle to make principled decisions about layer pruning and capacity allocation under hardware constraints.

$NEAR
AIBullisharXiv – CS AI Β· Mar 37/105
🧠

ALTER: Asymmetric LoRA for Token-Entropy-Guided Unlearning of LLMs

Researchers introduce ALTER, a new framework for efficiently "unlearning" specific knowledge from large language models while preserving their overall utility. The system uses asymmetric LoRA architecture to selectively forget targeted information with 95% effectiveness while maintaining over 90% model utility, significantly outperforming existing methods.

AIBullisharXiv – CS AI Β· Mar 37/105
🧠

KDFlow: A User-Friendly and Efficient Knowledge Distillation Framework for Large Language Models

Researchers have developed KDFlow, a new framework for compressing large language models that achieves 1.44x to 6.36x faster training speeds compared to existing knowledge distillation methods. The framework uses a decoupled architecture that optimizes both training and inference efficiency while reducing communication costs through innovative data transfer techniques.

AIBullisharXiv – CS AI Β· Mar 36/104
🧠

Post-training Large Language Models for Diverse High-Quality Responses

Researchers have developed DQO (Diversity Quality Optimization), a new training method that uses determinantal point processes to improve large language models' response diversity while maintaining quality. The approach addresses a key limitation of current reinforcement learning methods that tend to narrow LLM outputs to canonical responses.