AINeutralarXiv – CS AI · Apr 146/10
🧠
NovBench: Evaluating Large Language Models on Academic Paper Novelty Assessment
Researchers introduced NovBench, the first large-scale benchmark for evaluating how well large language models can assess research novelty in academic papers. The benchmark comprises 1,684 paper-review pairs from a leading NLP conference and reveals that current LLMs struggle with scientific novelty comprehension despite promise in peer review support.