AIBullisharXiv โ CS AI ยท 5d ago6/104
๐ง
Post-training Large Language Models for Diverse High-Quality Responses
Researchers have developed DQO (Diversity Quality Optimization), a new training method that uses determinantal point processes to improve large language models' response diversity while maintaining quality. The approach addresses a key limitation of current reinforcement learning methods that tend to narrow LLM outputs to canonical responses.