Instructing LLMs to Negotiate using Reinforcement Learning with Verifiable Rewards
Researchers demonstrate that Reinforcement Learning from Verifiable Rewards (RLVR) can train Large Language Models to negotiate effectively in incomplete-information games like price bargaining. A 30B parameter model trained with this method outperforms frontier models 10x its size and develops sophisticated persuasive strategies while generalizing to unseen negotiation scenarios.