#research-evaluation News & Analysis

3 articles tagged with #research-evaluation. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

3 articles

AIBearisharXiv – CS AI · Jun 97/10

🧠

Contemporary AI lacks the imagination to diverge or negate in science

A major peer-reviewed study of 6,749 scientists evaluated AI-generated research ideas and found that large language models lack imagination in scientific discovery, struggle to propose null hypotheses, and show weak agreement with human expert judgment. The research reveals significant limitations in AI's ability to accelerate science despite widespread industry optimism.

AINeutralarXiv – CS AI · Jun 256/10

🧠

ReviewGuard: Aligning LLM-Assisted Peer Review with Long-Term Scientific Impact

Researchers introduce ReviewGuard, an LLM-based framework that predicts long-term scientific impact rather than mimicking human peer reviewers. Testing on 20,861 AI/ML papers shows ReviewGuard correlates 5.6x better with future citations than human reviewers and identifies high-impact rejected papers at significantly higher rates, suggesting AI can complement editorial decision-making without replacing human judgment.

AIBearisharXiv – CS AI · Mar 26/1017

🧠

CMT-Benchmark: A Benchmark for Condensed Matter Theory Built by Expert Researchers

Researchers created CMT-Benchmark, a new dataset of 50 expert-level condensed matter theory problems to evaluate large language models' capabilities in advanced scientific research. The best performing model (GPT5) solved only 30% of problems, with the average across 17 models being just 11.4%, highlighting significant gaps in current AI's physical reasoning abilities.