AIBullisharXiv – CS AI · 10h ago7/10
🧠
Forge: Quality-Aware Reinforcement Learning for NP-Hard Optimization in LLMs
Researchers introduce OPT-BENCH, a framework for training LLMs on NP-hard optimization problems using quality-aware reinforcement learning. Testing on Qwen2.5-7B achieves 93.1% success rate and 46.6% quality ratio, substantially outperforming GPT-4o, with demonstrated transfer benefits across mathematics, logic, and reasoning tasks.
🧠 GPT-4