AIBullisharXiv – CS AI · Mar 57/10
🧠
When Silence Is Golden: Can LLMs Learn to Abstain in Temporal QA and Beyond?
Researchers developed a new training method combining Chain-of-Thought supervision with reinforcement learning to teach large language models when to abstain from answering temporal questions they're uncertain about. Their approach enabled a smaller Qwen2.5-1.5B model to outperform GPT-4o on temporal question answering tasks while improving reliability by 20% on unanswerable questions.
🧠 GPT-4