y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#reward-shaping News & Analysis

6 articles tagged with #reward-shaping. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

6 articles
AINeutralarXiv – CS AI · 4d ago6/10
🧠

PIRS: Physics-Informed Reward Shaping for SAC-Based Building Energy Management

Researchers introduce PIRS (Physics-Informed Reward Shaping), a method that improves deep reinforcement learning controllers for building energy management by replacing ad-hoc comfort metrics with ISO 7730 Predicted Mean Vote (PMV) standards. Tested on CityLearn v2.1.2, PIRS demonstrates competitive performance against manual baselines while substantially outperforming non-physics-grounded approaches in load ramping and peak demand metrics.

AINeutralarXiv – CS AI · May 126/10
🧠

OracleTSC: Oracle-Informed Reward Hurdle and Uncertainty Regularization for Traffic Signal Control

Researchers introduce OracleTSC, an LLM-based traffic signal control system that combines reward hurdle mechanisms and uncertainty regularization to stabilize reinforcement learning training. The approach achieves 75% reduction in travel time while maintaining interpretability through natural language explanations, with strong cross-intersection generalization capabilities.

AINeutralarXiv – CS AI · May 126/10
🧠

PiCA: Pivot-Based Credit Assignment for Search Agentic Reinforcement Learning

Researchers introduce PiCA (Pivot-Based Credit Assignment), a novel reinforcement learning mechanism that improves how LLM-based search agents learn from long sequences of actions. By identifying key pivot steps and anchoring rewards to final task outcomes, PiCA addresses critical challenges in credit assignment, delivering 15.2% performance gains on knowledge-intensive QA tasks.

AINeutralarXiv – CS AI · May 96/10
🧠

Owen-Shapley Policy Optimization: A Principled RL Algorithm for Generative Search LLMs

Researchers introduce Owen-Shapley Policy Optimization (OSPO), a reinforcement learning algorithm that improves how language models learn from feedback by attributing credit to individual tokens rather than treating entire sequences as atomic units. The method addresses a fundamental training gap in generative AI systems used for recommendation tasks, showing measurable improvements on real e-commerce datasets.

AIBullisharXiv – CS AI · Apr 146/10
🧠

The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping

Researchers introduce MEDS, a memory-enhanced reward shaping framework that addresses a critical reinforcement learning failure mode where language models repeatedly generate similar errors. By tracking historical behavioral patterns and penalizing recurring mistake clusters, the method achieves consistent performance improvements across multiple datasets and models while increasing sampling diversity.

AIBullisharXiv – CS AI · Mar 36/108
🧠

MVR: Multi-view Video Reward Shaping for Reinforcement Learning

Researchers introduce Multi-View Video Reward Shaping (MVR), a new reinforcement learning framework that uses multi-viewpoint video analysis and vision-language models to improve reward design for complex AI tasks. The system addresses limitations of single-image approaches by analyzing dynamic motions across multiple camera angles, showing improved performance on humanoid locomotion and manipulation tasks.