y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#trust-region News & Analysis

2 articles tagged with #trust-region. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles
AIBullisharXiv โ€“ CS AI ยท 2d ago7/10
๐Ÿง 

Proximal Supervised Fine-Tuning

Researchers propose Proximal Supervised Fine-Tuning (PSFT), a new method that applies trust-region constraints from reinforcement learning to improve how foundation models adapt to new tasks. The technique maintains model capabilities while fine-tuning, outperforming standard supervised fine-tuning on out-of-domain generalization tasks.

AIBullisharXiv โ€“ CS AI ยท Mar 26/1014
๐Ÿง 

Trust Region Masking for Long-Horizon LLM Reinforcement Learning

Researchers propose Trust Region Masking (TRM) to address off-policy mismatch problems in Large Language Model reinforcement learning pipelines. The method provides the first non-vacuous monotonic improvement guarantees for long-horizon LLM-RL tasks by masking entire sequences that violate trust region constraints.