AINeutralarXiv – CS AI · Apr 76/10
🧠
When Adaptive Rewards Hurt: Causal Probing and the Switching-Stability Dilemma in LLM-Guided LEO Satellite Scheduling
Research reveals that adaptive reward mechanisms in AI-guided satellite scheduling systems actually hurt performance, with static reward weights achieving 342.1 Mbps versus dynamic weights at only 103.3 Mbps. The study found that fine-tuned LLMs performed poorly due to weight oscillation issues, while simpler MLP models achieved superior results of 357.9 Mbps.