#llm-fine-tuning News & Analysis

17 articles tagged with #llm-fine-tuning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

17 articles

AIBullisharXiv – CS AI · May 297/10

🧠

Domain-Specific Data Synthesis for LLMs via Minimal Sufficient Representation Learning

Researchers introduce DOMINO, a framework that synthesizes domain-specific training data for large language models by learning from reference examples rather than explicit domain descriptions. The approach combines prompt tuning with contrastive learning to generate diverse, high-quality synthetic data without manual prompt engineering, improving coding task performance by up to 4.63%.

AIBullisharXiv – CS AI · May 297/10

🧠

Overcoming Forgetting in LLM Fine-Tuning with Evolution Strategies

Researchers demonstrate that Evolution Strategies (ES) can effectively fine-tune large language models without catastrophic forgetting of prior tasks, contrary to recent concerns. By introducing Anchored Weight Decay (AWD), a regularization technique that constrains optimization toward initial parameters, the work shows ES-based continual learning is viable and computationally efficient compared to reinforcement learning approaches.

AIBullisharXiv – CS AI · May 77/10

🧠

Diffusion-Inspired Masked Fine-Tuning for Knowledge Injection in Autoregressive LLMs

Researchers demonstrate that masked fine-tuning—a demasking objective borrowed from diffusion models—significantly improves knowledge injection in autoregressive LLMs without requiring expensive paraphrase augmentation and while remaining resistant to the reversal curse. This technique closes the performance gap between autoregressive and diffusion language models, with applications extending to math tasks and large-scale knowledge-intensive benchmarks.

AIBearisharXiv – CS AI · May 17/10

🧠

Secret Stealing Attacks on Local LLM Fine-Tuning through Supply-Chain Model Code Backdoors

Researchers demonstrate a novel attack that steals sensitive secrets (API keys, personal identifiers, financial records) from locally fine-tuned language models by embedding malicious code in model architectures. The attack achieves over 98% success rate and bypasses current defense mechanisms including differential privacy and code auditing, exposing a critical supply-chain vulnerability in AI model development.

AINeutralarXiv – CS AI · Jun 55/10

🧠

Improving Answer Extraction in Context-based Question Answering Systems Using LLMs

Researchers propose an improved question answering system using fine-tuned large language models on the SQuAD dataset, achieving strong performance metrics (ROUGE-L: 86.84%, BERTScore: 95.38%). The work addresses limitations in current LLM-based QA systems' ability to extract accurate answers from given contexts, demonstrating that targeted fine-tuning substantially enhances reliability and precision.

AINeutralarXiv – CS AI · Jun 56/10

🧠

Learning What Matters: Probabilistic Task Selection via Mutual Information for Model Finetuning

Researchers introduce TaskPGM, a framework that optimizes how training data is distributed across multiple tasks when fine-tuning large language models by modeling task relationships through an energy-based probabilistic approach. The method balances task coverage against redundancy, demonstrating improvements over conventional uniform or size-proportional sampling strategies across multiple model families and evaluation benchmarks.

AIBullisharXiv – CS AI · Jun 46/10

🧠

StepPRM-RTL: Stepwise Process-Reward Guided LLM Fine-Tuning for Enhanced RTL Synthesis

Researchers introduce StepPRM-RTL, a framework that enhances LLM-based RTL code generation for hardware design by combining stepwise trajectory modeling, process-reward models, and retrieval-augmented fine-tuning. The system achieves over 10% improvement in functional correctness compared to prior methods, advancing automation in hardware design workflows.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Efficient Exploration for Iterative Nash Preference Optimization

Researchers propose an improved Nash Learning from Human Feedback (NLHF) algorithm that addresses exploration challenges in preference alignment for large language models. The new method achieves better regret bounds without exponential dependence on regularization parameters and demonstrates empirical improvements when fine-tuning Llama-3-8B.

🧠 Llama

AIBullisharXiv – CS AI · May 296/10

🧠

A Language-Guided Bayesian Optimization for Efficient LoRA Hyperparameter Search

Researchers propose a Bayesian Optimization framework that uses pre-trained Large Language Models to efficiently search for optimal LoRA (Low-Rank Adaptation) hyperparameters by encoding domain knowledge as natural language prompts. The method discovers high-performing configurations in ~30 iterations versus 45,000 combinations, achieving 20% performance improvements while significantly reducing computational costs.

AINeutralarXiv – CS AI · May 116/10

🧠

Beyond LoRA vs. Full Fine-Tuning: Gradient-Guided Optimizer Routing for LLM Adaptation

Researchers propose MoLF (Mixture of LoRA and Full Fine-Tuning), a hybrid framework that dynamically routes gradient updates between full fine-tuning and low-rank adaptation during LLM training. The approach addresses limitations of relying solely on either method, achieving competitive or superior performance across diverse tasks while maintaining training stability and memory efficiency.

AINeutralarXiv – CS AI · May 116/10

🧠

PSK@EEUCA 2026: Fine-Tuning Large Language Models with Synthetic Data Augmentation for Multi-Class Toxicity Detection in Gaming Chat

Researchers developed a toxicity detection system for gaming chat using fine-tuned Llama 3.1 with synthetic data augmentation, achieving 4th place in the EEUCA 2026 shared task. The system classifies messages into six toxicity categories and reveals a critical "validation trap" phenomenon where high validation performance doesn't correlate with strong test set generalization.

🧠 Llama

AIBullisharXiv – CS AI · May 76/10

🧠

Delta-Based Neural Architecture Search: LLM Fine-Tuning via Code Diffs

Researchers introduce Delta-Code Generation, a method where fine-tuned LLMs generate compact code diffs to modify existing neural architectures rather than creating complete models from scratch. The approach achieves significantly higher validity rates (66-75%) and accuracy (64-66%) compared to baseline full-generation methods while reducing output by 75-85%, demonstrating a more efficient paradigm for LLM-driven neural architecture search.

AINeutralarXiv – CS AI · Apr 206/10

🧠

Reading Between the Lines: The One-Sided Conversation Problem

Researchers formalize the one-sided conversation problem (1SC), where only one participant's dialogue can be recorded—common in telemedicine, call centers, and smart glasses. The study evaluates methods to reconstruct missing speaker turns and generate summaries from incomplete transcripts, finding that smaller models require finetuning while larger models show promise with prompting techniques.

AINeutralarXiv – CS AI · Apr 156/10

🧠

Polynomial Expansion Rank Adaptation: Enhancing Low-Rank Fine-Tuning with High-Order Interactions

Researchers propose Polynomial Expansion Rank Adaptation (PERA), a novel fine-tuning method that enhances Low-Rank Adaptation (LoRA) by incorporating high-order polynomial interactions into low-rank factors. PERA improves the expressive capacity of LLM fine-tuning without increasing computational costs, demonstrating consistent performance gains across benchmarks while maintaining the efficiency benefits of rank-constrained adaptation.

AINeutralarXiv – CS AI · Apr 156/10

🧠

Fine-Tuning LLMs for Report Summarization: Analysis on Supervised and Unsupervised Data

Researchers demonstrate that fine-tuning Large Language Models for report summarization is feasible on limited on-premise hardware (1-2 A100 GPUs), addressing practical constraints in sensitive government and intelligence applications. The study compares supervised and unsupervised approaches, finding that fine-tuning improves summary quality and reduces invalid outputs, even without ground-truth training data.

AIBullisharXiv – CS AI · Apr 146/10

🧠

New Hybrid Fine-Tuning Paradigm for LLMs: Algorithm Design and Convergence Analysis Framework

Researchers propose a novel hybrid fine-tuning method for Large Language Models that combines full parameter updates with Parameter-Efficient Fine-Tuning (PEFT) modules using zeroth-order and first-order optimization. The approach addresses computational constraints of full fine-tuning while overcoming PEFT's limitations in knowledge acquisition, backed by theoretical convergence analysis and empirical validation across multiple tasks.

AIBullisharXiv – CS AI · Mar 37/108

🧠

FT-Dojo: Towards Autonomous LLM Fine-Tuning with Language Agents

Researchers introduce FT-Dojo, an interactive environment for studying autonomous LLM fine-tuning, along with FT-Agent, an AI system that can automatically fine-tune language models without human intervention. The system achieved best performance on 10 out of 13 tasks across five domains, demonstrating the potential for fully automated machine learning workflows while revealing current limitations in AI reasoning capabilities.