AINeutralarXiv โ CS AI ยท 7h ago6/10
๐ง
RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension
Researchers introduce RPC-Bench, a large-scale benchmark containing 15,000 human-verified question-answer pairs designed to evaluate how well AI models understand research papers. Testing reveals that even the strongest models like GPT-5 achieve only 68.2% accuracy on comprehension tasks, dropping significantly when conciseness is factored in, exposing critical gaps in academic document understanding.
๐ง GPT-5