Gaming AI-Assisted Peer Reviews Poses New Risks to the Scientific Community
Researchers demonstrate that AI-assisted peer review systems are vulnerable to simple adversarial attacks, with superficial abstract rephrasing increasing acceptance ratings by up to 1.31 points on a 10-point scale without changing underlying scientific content. The low-cost manipulation ($1, 5 minutes) reveals systemic risks in AI-mediated scientific evaluation and raises concerns about authors optimizing for algorithmic judgment rather than merit.
The vulnerability of AI peer review systems to abstract rephrasing represents a critical inflection point in how artificial intelligence integrates into scientific validation infrastructure. The attack succeeds across disciplines and venues, achieving a 38% success rate on average and over 50% when initial reviews suggest rejection. This isn't merely a technical flaw; it exposes a fundamental tension in delegating high-stakes decisions to AI systems without robust safeguards. The researchers demonstrate that adversarial abstracts boost not just overall scores but also confidence levels and assessments of core criteria like soundness and significance, meaning inflated reviews influence downstream human decision-making at editorial levels.
The broader context reflects the rush to deploy AI tools for efficiency gains in peer review, where reviewer burden has become genuine. Publishers and platforms view AI assistance as a solution to publication backlogs and workload. However, this study reveals that treating AI as neutral evaluators creates perverse incentives: authors gain competitive advantage by optimizing presentation for algorithms rather than advancing science. The cost-to-benefit ratio heavily favors manipulation, requiring minimal investment and expertise.
For the scientific ecosystem, this poses structural risk. When acceptance decisions increasingly depend on AI review scores, the selection pressure shifts from scientific merit to algorithmic gaming. This could compound quality issues in published research and erode peer review's gatekeeping function. The findings demand transparency in which venues use AI for final decisions versus initial screening, systematic robustness testing before deployment, and human-in-the-loop oversight. Without intervention, AI-assisted peer review may become a liability rather than efficiency tool, potentially accelerating publication of weaker research while benefiting authors with resources to optimize for machines.
- βSimple abstract rephrasing without content changes increases AI review acceptance ratings by 0.88-1.31 points and achieves 38% overall attack success, rising above 50% when initial reviews suggest rejection.
- βThe manipulation costs only $1 and takes 5 minutes per submission, making it economically viable for widespread exploitation across academic disciplines.
- βInflated AI reviews influence core scientific criteria assessments including soundness, significance, and perceived contribution, not just overall scores.
- βAI-mediated peer review creates perverse incentives for authors to optimize manuscripts for algorithmic judgment rather than scientific merit, potentially degrading research quality.
- βCurrent AI tools lack systematic robustness testing and safeguards for high-stakes peer review, requiring transparent oversight and human decision-making authority before wider deployment.