AIBullisharXiv โ CS AI ยท Feb 276/107
๐ง
Duel-Evolve: Reward-Free Test-Time Scaling via LLM Self-Preferences
Researchers introduce Duel-Evolve, a new optimization algorithm that improves LLM performance at test time without requiring external rewards or labels. The method uses self-generated pairwise comparisons and achieved 20 percentage points higher accuracy on MathBench and 12 percentage points improvement on LiveCodeBench.