AIBullisharXiv – CS AI · 15h ago7/10
🧠
Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders
Researchers introduce SAERL, a data engineering framework that uses Sparse Autoencoders to extract intrinsic signals from LLM internals for improved reinforcement learning post-training. The method achieves 3% accuracy gains and 20% faster convergence on math reasoning tasks by modeling data diversity, difficulty, and quality—demonstrating that model internals provide practical signals beyond external training data metrics.