AIBullisharXiv โ CS AI ยท Mar 57/10
๐ง
When Silence Is Golden: Can LLMs Learn to Abstain in Temporal QA and Beyond?
Researchers developed a new training method combining Chain-of-Thought supervision with reinforcement learning to teach large language models when to abstain from answering temporal questions they're uncertain about. Their approach enabled a smaller Qwen2.5-1.5B model to outperform GPT-4o on temporal question answering tasks while improving reliability by 20% on unanswerable questions.
๐ง GPT-4