AIBullisharXiv – CS AI · 14h ago7/10
🧠
Less is Enough: Synthesizing Diverse Data in LLM Feature Space with Sparse Autoencoders
Researchers propose Feature Activation Coverage (FAC), a new metric for measuring data diversity in large language models using sparse autoencoders instead of traditional text-based metrics. The FAC Synthesis framework generates synthetic training data to fill feature gaps, demonstrating consistent improvements across multiple tasks and revealing transferable feature spaces across different model families.