#supervised-finetuning News & Analysis

6 articles tagged with #supervised-finetuning. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

6 articles

AIBullisharXiv – CS AI · Jun 117/10

🧠

OpenMedReason: Scientific Reasoning Supervision for Medical Vision-Language Models

Researchers introduce OpenMedReason, a 450K-instance dataset of medical images paired with reasoning traces derived from scientific literature, designed to improve vision-language models for clinical applications. The dataset enables 20% accuracy improvements in medical visual question-answering and demonstrates that AI models can learn to ground diagnostic reasoning in evidence rather than producing answers without justification.

🏢 Hugging Face

AIBullisharXiv – CS AI · May 277/10

🧠

GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL

GUI-Libra presents a specialized training methodology for native GUI agents that addresses critical gaps between open-source and closed-source systems through action-aware supervised fine-tuning and improved reinforcement learning with partial verifiability. The work introduces an 81K curated GUI reasoning dataset and demonstrates consistent improvements across web and mobile benchmarks without requiring expensive online data collection.

AINeutralarXiv – CS AI · Apr 107/10

🧠

Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

Researchers challenge the conventional wisdom that supervised finetuning (SFT) merely memorizes while reinforcement learning generalizes. Their analysis reveals that reasoning SFT with chain-of-thought supervision can generalize across domains, but success depends critically on optimization duration, data quality, and base model strength, with generalization improvements coming at the cost of degraded safety performance.

AIBullisharXiv – CS AI · Apr 77/10

🧠

PassiveQA: A Three-Action Framework for Epistemically Calibrated Question Answering via Supervised Finetuning

Researchers propose PassiveQA, a new AI framework that teaches language models to recognize when they don't have enough information to answer questions, choosing to ask for clarification or abstain rather than hallucinate responses. The three-action system (Answer, Ask, Abstain) uses supervised fine-tuning to align model behavior with information sufficiency, showing significant improvements in reducing hallucinations.

AIBullisharXiv – CS AI · Mar 57/10

🧠

Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

Researchers introduce Agent Data Protocol (ADP), a standardized format for unifying diverse AI agent training datasets across different formats and tools. The protocol enabled training on 13 unified datasets, achieving ~20% performance gains over base models and state-of-the-art results on coding, browsing, and tool use benchmarks.

AINeutralarXiv – CS AI · May 286/10

🧠

Zipping the Thought: When and How Compressed Reasoning Data Works in LLM Post-Training

Researchers propose a taxonomy of chain-of-thought (CoT) reasoning in LLM post-training, distinguishing between explicit, composed, and implicit reasoning formats. The study reveals that compressed reasoning data requires different training approaches, with composed CoT benefiting from data scaling while implicit CoT risks memorization, and that reinforcement learning can decompose compressed steps learned during supervised fine-tuning.