AIBullisharXiv โ CS AI ยท 2d ago7/10
๐ง
IH-Challenge: A Training Dataset to Improve Instruction Hierarchy on Frontier LLMs
OpenAI researchers introduce IH-Challenge, a reinforcement learning dataset designed to improve instruction hierarchy in frontier LLMs. Fine-tuning GPT-5-Mini with this dataset improved robustness by 10% and significantly reduced unsafe behavior while maintaining helpfulness.
๐ข OpenAI๐ข Hugging Face๐ง GPT-5