🧠 AI🔴 BearishImportance 6/10

CMT-Benchmark: A Benchmark for Condensed Matter Theory Built by Expert Researchers

arXiv – CS AI|Haining Pan, James V. Roggeveen, Erez Berg, Juan Carrasquilla, Debanjan Chowdhury, Surya Ganguli, Federico Ghimenti, Juraj Hasik, Henry Hunt, Hong-Chen Jiang, Mason Kamb, Ying-Jer Kao, Ehsan Khatami, Michael J. Lawler, Di Luo, Titus Neupert, Xiaoliang Qi, Michael P. Brenner, Eun-Ah Kim|March 2, 2026 at 05:00 AM|17 views

🤖AI Summary

Researchers created CMT-Benchmark, a new dataset of 50 expert-level condensed matter theory problems to evaluate large language models' capabilities in advanced scientific research. The best performing model (GPT5) solved only 30% of problems, with the average across 17 models being just 11.4%, highlighting significant gaps in current AI's physical reasoning abilities.

Key Takeaways

→CMT-Benchmark contains 50 expert-level condensed matter theory problems designed by researchers worldwide to test AI capabilities in advanced physics.
→GPT5 achieved the highest score at 30% while the average across 17 major models was only 11.4%.
→18 problems remained unsolved by all 17 tested models, particularly in Quantum Monte Carlo and DMRG areas.
→Current LLMs frequently produce answers that violate fundamental physics principles and symmetries.
→The benchmark reveals substantial limitations in AI's ability to handle research-level scientific problems.

#ai-benchmarks #large-language-models #scientific-ai #physics #research-evaluation #ai-limitations #condensed-matter #quantum-computing #machine-learning #ai-performance

Read Original →via arXiv – CS AI

Act on this with AI

Stay ahead of the market.

Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.

Connect Wallet to AI →How it works

AIMay 6

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

AIMay 6

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

AIMay 6

CMT-Benchmark: A Benchmark for Condensed Matter Theory Built by Expert Researchers

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

S&P 500 and NASDAQ hit record highs as AI chip stocks surge