y0news
← Feed
Back to feed
🧠 AI🟢 Bullish

When Silence Is Golden: Can LLMs Learn to Abstain in Temporal QA and Beyond?

arXiv – CS AI|Xinyu Zhou, Chang Jin, Carsten Eickhoff, Zhijiang Guo, Seyed Ali Bahrainian|
🤖AI Summary

Researchers developed a new training method combining Chain-of-Thought supervision with reinforcement learning to teach large language models when to abstain from answering temporal questions they're uncertain about. Their approach enabled a smaller Qwen2.5-1.5B model to outperform GPT-4o on temporal question answering tasks while improving reliability by 20% on unanswerable questions.

Key Takeaways
  • LLMs often produce misleading answers instead of admitting uncertainty, particularly in temporal reasoning tasks.
  • A new training pipeline combining Chain-of-Thought supervision with reinforcement learning teaches models when to abstain from answering.
  • The method enabled Qwen2.5-1.5B to surpass GPT-4o performance by 3.46% and 5.80% on TimeQA benchmarks.
  • Reinforcement learning improved reasoning accuracy but supervised fine-tuning led to overconfidence issues.
  • Implicit reasoning cues provided limited benefits compared to explicit Chain-of-Thought supervision for abstention training.
Mentioned in AI
Models
GPT-4OpenAI
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles