AINeutralarXiv – CS AI · 6h ago6/10
🧠
Conversation for Non-verifiable Learning: Self-Evolving LLMs through Meta-Evaluation
Researchers introduce CoNL, a framework that enables large language models to improve themselves through multi-agent self-play without requiring ground-truth labels or external judges. The system uses critiques that successfully improve solutions as training signals, allowing models to jointly optimize both generation and evaluation capabilities for non-verifiable tasks like creative writing and ethical reasoning.