AIBearisharXiv โ CS AI ยท 4h ago4
๐ง
CMT-Benchmark: A Benchmark for Condensed Matter Theory Built by Expert Researchers
Researchers created CMT-Benchmark, a new dataset of 50 expert-level condensed matter theory problems to evaluate large language models' capabilities in advanced scientific research. The best performing model (GPT5) solved only 30% of problems, with the average across 17 models being just 11.4%, highlighting significant gaps in current AI's physical reasoning abilities.