#self-supervision News & Analysis

2 articles tagged with #self-supervision. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles

AIBullisharXiv – CS AI · May 97/10

🧠

Internalizing Outcome Supervision into Process Supervision: A New Paradigm for Reinforcement Learning for Reasoning

Researchers propose a novel reinforcement learning framework that automatically generates process-level supervision from outcome-only feedback, eliminating the need for costly external process supervision. This approach enables fine-grained credit assignment in reasoning tasks by having models identify and learn from their own failed trajectories.

AIBullisharXiv – CS AI · Mar 37/104

🧠

Self-Harmony: Learning to Harmonize Self-Supervision and Self-Play in Test-Time Reinforcement Learning

Researchers introduce Self-Harmony, a new test-time reinforcement learning framework that improves AI model accuracy by having models solve problems and rephrase questions simultaneously. The method uses harmonic mean aggregation instead of majority voting to select stable answers, achieving state-of-the-art results across 28 of 30 reasoning benchmarks without requiring human supervision.