AINeutralarXiv โ CS AI ยท 14h ago6/10
๐ง
NovBench: Evaluating Large Language Models on Academic Paper Novelty Assessment
Researchers introduced NovBench, the first large-scale benchmark for evaluating how well large language models can assess research novelty in academic papers. The benchmark comprises 1,684 paper-review pairs from a leading NLP conference and reveals that current LLMs struggle with scientific novelty comprehension despite promise in peer review support.