y0news
AnalyticsDigestsSourcesTopicsRSSAICrypto

#research-evaluation News & Analysis

1 article tagged with #research-evaluation. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

1 articles
AIBearisharXiv โ€“ CS AI ยท Mar 26/1017
๐Ÿง 

CMT-Benchmark: A Benchmark for Condensed Matter Theory Built by Expert Researchers

Researchers created CMT-Benchmark, a new dataset of 50 expert-level condensed matter theory problems to evaluate large language models' capabilities in advanced scientific research. The best performing model (GPT5) solved only 30% of problems, with the average across 17 models being just 11.4%, highlighting significant gaps in current AI's physical reasoning abilities.