FirstPass: Grounding AI Scientific Judgment in Multi-Round Editorial Outcomes
Researchers introduce FirstPass, a dataset and fine-tuned AI model that significantly improves peer-review prediction by training on 3,668 multi-round editorial dialogues from Nature Communications across five scientific domains. The model achieves 80.5% accuracy in predicting editorial outcomes, outperforming existing systems by grounding AI judgment in real iterative peer-review processes rather than stylistic mimicry.
FirstPass addresses a fundamental limitation in current AI peer-review systems: they lack grounding in the messy, iterative reality of scientific validation. By curating transparent peer-review data from Nature Communications, researchers have created the first large-scale dataset capturing complete multi-round editorial dialogues across biology, chemistry, neuroscience, physics, and earth science. This cross-domain breadth matters because peer-review practices vary significantly by discipline, yet most prior AI systems train exclusively on computer science venues.
The technical innovation centers on response-only loss masking during fine-tuning—a training approach that proves essential rather than merely helpful. Without it, the model performs below baseline accuracy at 62.0%; with it, FirstPass reaches 80.5% accuracy on predicting whether papers enter Standard or Extended revision cycles. This finding suggests that previous peer-review AI systems may have fundamentally misaligned their training objectives, conflating reviewer language patterns with editorial judgment.
For the scientific community, FirstPass offers practical value as a pre-submission tool, enabling authors to simulate expert critique before formal submission. This democratizes access to editorial insights, potentially reducing revision cycles and accelerating publication timelines. The model's consistent cross-domain performance suggests genuine understanding rather than domain-specific overfitting.
The deployment scenario—functioning as an anticipatory scientific co-author—represents a meaningful shift from treating AI as a review generator to positioning it as judgment-prediction infrastructure. Future work likely involves integration into manuscript management systems and validation against editorial outcomes from other publishers to test generalization.
- →FirstPass achieves 80.5% accuracy predicting editorial outcomes using multi-round peer-review dialogues, significantly outperforming prior AI systems
- →Response-only loss masking proved essential to model performance, dropping accuracy from 80.5% to 62% when removed
- →Dataset spans 3,668 complete peer-review cycles across five scientific domains, providing unprecedented scale and cross-disciplinary representation
- →Model generates reviews averaging 1,187 words, substantially closer to human length (2,155 words) than existing baselines
- →FirstPass deployed as pre-submission tool enables authors to predict revision requirements before formal submission