AINeutralarXiv – CS AI · 18h ago6/10
🧠
Can the Environment Speak for Itself? $T^{2}$-GRPO: A Turn-Trajectory Group Relative Policy Optimization for Caregiver Agents
Researchers propose T²-GRPO, a reinforcement learning framework that optimizes large language models for dementia caregiver agents by balancing immediate patient feedback with long-term care outcomes. The method uses environment-grounded rewards and safety constraints to improve emotional intelligence in AI caregiving scenarios.