y0news
AnalyticsDigestsSourcesRSSAICrypto
#safety-tradeoffs1 article
1 articles
AINeutralarXiv โ€“ CS AI ยท 7h ago7/10
๐Ÿง 

Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

Researchers challenge the conventional wisdom that supervised finetuning (SFT) merely memorizes while reinforcement learning generalizes. Their analysis reveals that reasoning SFT with chain-of-thought supervision can generalize across domains, but success depends critically on optimization duration, data quality, and base model strength, with generalization improvements coming at the cost of degraded safety performance.