y0news
← Feed
Back to feed
🧠 AI🔴 Bearish

CMT-Benchmark: A Benchmark for Condensed Matter Theory Built by Expert Researchers

arXiv – CS AI|Haining Pan, James V. Roggeveen, Erez Berg, Juan Carrasquilla, Debanjan Chowdhury, Surya Ganguli, Federico Ghimenti, Juraj Hasik, Henry Hunt, Hong-Chen Jiang, Mason Kamb, Ying-Jer Kao, Ehsan Khatami, Michael J. Lawler, Di Luo, Titus Neupert, Xiaoliang Qi, Michael P. Brenner, Eun-Ah Kim||4 views
🤖AI Summary

Researchers created CMT-Benchmark, a new dataset of 50 expert-level condensed matter theory problems to evaluate large language models' capabilities in advanced scientific research. The best performing model (GPT5) solved only 30% of problems, with the average across 17 models being just 11.4%, highlighting significant gaps in current AI's physical reasoning abilities.

Key Takeaways
  • CMT-Benchmark contains 50 expert-level condensed matter theory problems designed by researchers worldwide to test AI capabilities in advanced physics.
  • GPT5 achieved the highest score at 30% while the average across 17 major models was only 11.4%.
  • 18 problems remained unsolved by all 17 tested models, particularly in Quantum Monte Carlo and DMRG areas.
  • Current LLMs frequently produce answers that violate fundamental physics principles and symmetries.
  • The benchmark reveals substantial limitations in AI's ability to handle research-level scientific problems.
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles