#regret-minimization News & Analysis

3 articles tagged with #regret-minimization. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

3 articles

AIBullisharXiv – CS AI · Jun 96/10

🧠

A Regret Minimization Framework on Preference Learning in Large Language Models

Researchers introduce Regret-based Preference Optimization (RePO), a new framework for training large language models that reinterprets reinforcement learning from human feedback (RLHF) through regret minimization rather than reward maximization. The approach models human preferences as behavior-conditioned assessments of relative suboptimality, showing consistent performance gains on mathematical reasoning and preference benchmarks.

AINeutralarXiv – CS AI · Jun 56/10

🧠

Regret Minimization with Adaptive Opponents in Repeated Games

Researchers introduce Repeated Policy Regret (RP-Regret), a new game-theoretic metric for analyzing regret minimization in repeated games with adaptive opponents who can respond to historical play. The paper proposes three algorithms to minimize RP-Regret despite its non-convex nature and demonstrates that when all players use these algorithms, certain subgame perfect equilibria can be learned, with experiments showing improved cooperation in games like Stag-Hunt.

AIBullisharXiv – CS AI · Jun 16/10

🧠

Post-Training LLMs as Better Decision-Making Agents: A Regret-Minimization Approach

Researchers introduce Iterative Regret-Minimization Fine-Tuning (Iterative RMFT), a post-training method that improves LLMs' decision-making capabilities by iteratively distilling low-regret trajectories back into models. The approach addresses fundamental limitations in how LLMs handle online decision problems without relying on rigid algorithmic templates, demonstrating improvements across multiple model architectures.

🧠 GPT-4