AIBullisharXiv – CS AI · 10h ago7/10
🧠
Reinforcement learning to improve large language model-based automated code compliance systems
Researchers introduce P4IR, a two-stage framework combining supervised fine-tuning and Group Relative Policy Optimization to improve LLM accuracy in automated building code compliance systems. The approach reduces errors by up to 38.6% compared to baseline models and outperforms leading LLMs like Claude and GPT in zero-shot settings.
🧠 GPT-5🧠 Claude🧠 Sonnet