EmoDistill: Offline Emotion Skill Distillation for Language Model Agents in Adversarial Negotiation
Researchers introduce EmoDistill, an offline framework that teaches language model agents to strategically use emotion in adversarial negotiations. The system decomposes emotional strategy into emotion selection and expression, with experiments showing that emotionally-framed language significantly shifts negotiation outcomes, suggesting emotion functions as a tactical tool rather than stylistic decoration.
EmoDistill addresses a critical vulnerability in aligned language models: their inherent tendency toward politeness and safety can be exploited in adversarial settings where emotional framing steers agents toward suboptimal outcomes. By treating emotion as a learnable strategic action rather than surface-level behavior, this research reframes how we understand LLM decision-making in competitive environments. The framework's dual-component architecture—using Implicit Q-Learning for emotion selection and LoRA-based fine-tuning for expression—demonstrates sophisticated decomposition of negotiation skills.
This work builds on growing recognition that LLM alignment, while beneficial for safety, creates exploitable patterns in high-stakes interactions. Previous research showed that adversarially-designed prompts can manipulate model outputs; EmoDistill systematizes this insight by training agents to harness emotion strategically. The offline training approach avoids expensive online negotiation simulations, making the framework practical for real-world deployment.
For AI safety researchers and developers building autonomous agents for competitive domains, EmoDistill raises important questions about robustness versus alignment trade-offs. The framework's success in outperforming vanilla baselines across four domains and achieving transfer learning capabilities indicates that strategic emotion use is a generalizable skill. Organizations developing negotiation agents, trading systems, or other adversarial applications must now consider whether their models are adequately equipped to compete in environments where emotional intelligence becomes weaponized. The research suggests future AI systems may require explicit defensive training against emotionally-framed manipulation tactics.
- →Emotion functions as strategic action in negotiation, not merely stylistic decoration, with measurable impact on outcome shifting.
- →EmoDistill's decomposed approach of emotion selection plus expression outperforms monolithic LLM baselines in competitive settings.
- →Offline training framework eliminates need for costly online agent-to-agent negotiation during skill acquisition.
- →Models demonstrate strong transfer learning across unseen domains, counterparties, and trained-versus-trained tournaments.
- →AI alignment toward politeness creates exploitable vulnerabilities in adversarial contexts requiring strategic emotional intelligence.