Rebuttals Move Peer-Review Scores, but Initial-Review Structure Bounds the Movement
Researchers analyzed 73,000 reviewer trajectories from ICLR 2024-2025 to measure how author rebuttals affect peer-review scores. Using LLMs as measurement tools, they found that while rebuttals can move scores, initial review structure predicts most score movement, constraining rebuttal impact to measurable but bounded effects.
This study addresses a fundamental transparency gap in academic peer review by quantifying rebuttal effectiveness using computational methods. Researchers leveraged archived pre- and post-rebuttal scores to isolate rebuttal content from confounding factors like reviewer confidence and discussion dynamics. The methodology is notable for treating LLMs as measurement instruments rather than decision-makers, maintaining human oversight while scaling analysis across 73,000 reviews.
The findings reveal nuanced dynamics in peer review that challenge assumptions about rebuttal influence. When review text initially reads below the assigned score, only 8.3% of reviewers increase scores post-rebuttal; this rate jumps to 31.9% when text reads above the score, suggesting reviewers resist contradiction but respond to substantive rebuttals. The developed 44-feature taxonomy of reviewer-author exchanges provides actionable categories for understanding successful versus failed rebuttals, with 23 features replicating across models and validation sets.
The predictive modeling demonstrates that initial review structure alone achieves 0.747 AUC for predicting score movement, constraining the maximum possible improvement from rebuttal information. Adding resolved exchange signals only raises this to 0.804, indicating structural factors dominate rebuttal content. This suggests review quality and initial reviewer positioning substantially predetermine outcomes before rebuttals occur.
These findings have implications for improving peer review processes. The research suggests focusing on initial review quality and reviewer calibration rather than assuming robust rebuttals overcome systemic issues. The identified failure modes in exchanges offer targets for training reviewers and guiding authors on effective rebuttal strategies. Understanding these constraints enables more realistic expectations about peer review reform.
- βInitial review structure predicts most score movement (AUC 0.747), constraining rebuttal impact despite adding exchange information only raising AUC to 0.804
- βScore increase rates vary dramatically from 8.3% to 31.9% depending on whether review text reads below or above assigned scores
- βA 44-feature taxonomy of reviewer-author exchanges was developed, with 23 features replicating robustly across models and validation years
- βRebuttals show measurable but bounded effects, with most robust exchange signals reflecting rebuttal failure modes rather than successes
- βThe study analyzed 73,000 reviewer trajectories from ICLR 2024-2025 using LLMs as measurement instruments while maintaining human oversight