y0news
#research-evaluation1 article
1 articles
AIBearisharXiv โ€“ CS AI ยท 6h ago4
๐Ÿง 

CMT-Benchmark: A Benchmark for Condensed Matter Theory Built by Expert Researchers

Researchers created CMT-Benchmark, a new dataset of 50 expert-level condensed matter theory problems to evaluate large language models' capabilities in advanced scientific research. The best performing model (GPT5) solved only 30% of problems, with the average across 17 models being just 11.4%, highlighting significant gaps in current AI's physical reasoning abilities.