AIBullisharXiv โ CS AI ยท 6h ago2
๐ง
Provable and Practical In-Context Policy Optimization for Self-Improvement
Researchers introduce In-Context Policy Optimization (ICPO), a new method that allows AI models to improve their responses during inference through multi-round self-reflection without parameter updates. The practical ME-ICPO algorithm demonstrates competitive performance on mathematical reasoning tasks while maintaining affordable inference costs.