🧠 AI🟢 BullishImportance 7/10

When Silence Is Golden: Can LLMs Learn to Abstain in Temporal QA and Beyond?

arXiv – CS AI|Xinyu Zhou, Chang Jin, Carsten Eickhoff, Zhijiang Guo, Seyed Ali Bahrainian|March 5, 2026 at 05:00 AM

🤖AI Summary

Researchers developed a new training method combining Chain-of-Thought supervision with reinforcement learning to teach large language models when to abstain from answering temporal questions they're uncertain about. Their approach enabled a smaller Qwen2.5-1.5B model to outperform GPT-4o on temporal question answering tasks while improving reliability by 20% on unanswerable questions.

Key Takeaways

→LLMs often produce misleading answers instead of admitting uncertainty, particularly in temporal reasoning tasks.
→A new training pipeline combining Chain-of-Thought supervision with reinforcement learning teaches models when to abstain from answering.
→The method enabled Qwen2.5-1.5B to surpass GPT-4o performance by 3.46% and 5.80% on TimeQA benchmarks.
→Reinforcement learning improved reasoning accuracy but supervised fine-tuning led to overconfidence issues.
→Implicit reasoning cues provided limited benefits compared to explicit Chain-of-Thought supervision for abstention training.

Mentioned in AI

Models

GPT-4OpenAI