AINeutralarXiv – CS AI · May 297/10
🧠Researchers introduce PRAIB, a benchmark framework that evaluates how Large Language Models perform peer review compared to human reviewers. Analysis of 11,000 LLM-generated reviews across major AI conferences reveals significant behavioral divergences: LLM ratings show less variability, positive bias, overconfidence, and frequently miss atomic weaknesses that human reviewers catch.
AIBearisharXiv – CS AI · May 97/10
🧠A comprehensive study reveals that while AI adoption in research has surged exponentially since 2015, the technology remains concentrated in narrow domains tied to computer science with limited epistemological transformation. The research identifies concerning patterns including higher retraction rates in AI-supported work, citation inflation, and geographic disparities in adoption across countries and disciplines.
AINeutralarXiv – CS AI · Jun 56/10
🧠EGTR-Review presents a novel framework for automating scientific peer review using a multi-agent teacher model that distills its reasoning into a lightweight student model, achieving superior performance with significantly lower computational costs while maintaining evidence traceability and factual grounding.
AINeutralarXiv – CS AI · Jun 45/10
🧠Researchers propose an automated technique for generating research paper titles from abstracts using large language models, testing multiple approaches including fine-tuned PEGASUS and zero-shot GPT-3.5-turbo. Fine-tuned PEGASUS-large emerges as the top performer, though ChatGPT demonstrates creative title generation capabilities, suggesting AI-generated titles are practical and reliable for academic publishing workflows.
🧠 ChatGPT
AINeutralarXiv – CS AI · Jun 26/10
🧠Researchers developed AI-Paper-Review, a tool that generates structured peer review feedback for academic papers using multiple AI reviewers, and conducted a case study on 20 computer architecture submissions to measure how well AI review aligns with human review. The study finds that AI review can identify significant portions of human-raised issues while also surfacing problems missed by human reviewers, raising important questions about AI's role in academic peer review without endorsing its use for formal publication decisions.
AINeutralarXiv – CS AI · Jun 16/10
🧠Researchers introduce Crafter, a multi-agent system for generating publication-quality scientific figures from diverse inputs that generalizes across figure types without architectural changes. The work addresses a critical gap in automation tools by enabling editable SVG outputs and introduces CraftBench, a comprehensive benchmark for evaluating figure generation across multiple types and input conditions.
AINeutralarXiv – CS AI · May 286/10
🧠DiagramRAG is a new retrieval-augmented framework that converts rough sketches into publication-quality scientific diagrams by retrieving semantically and topologically compatible reference diagrams. The system achieves strong performance metrics (F1-scores of 0.848 and 0.802 on benchmark datasets) while maintaining efficient inference at 35.48 seconds per sample.
🏢 Hugging Face
AINeutralarXiv – CS AI · May 276/10
🧠Researchers introduce TADDLE, an AI system that detects quality deficiencies in LLM-generated peer reviews by decomposing analysis into specialized tools and multi-label classification. The work addresses a growing problem in academic publishing where AI-written reviews are fluent but potentially flawed, backed by the first expert-annotated benchmark of 1,800 reviews across six defect categories.
AINeutralarXiv – CS AI · May 276/10
🧠A new study demonstrates that pooled benchmarks for detecting AI-generated academic text systematically misrepresent AI adoption across countries and research fields by ignoring contextual stylistic variations. Using country-field-specific benchmarks instead provides more accurate measurements and reveals that previous estimates substantially over- or underestimated AI use depending on geographic and disciplinary context.
AINeutralarXiv – CS AI · May 276/10
🧠CitePrism introduces a human-in-the-loop AI framework designed to assist editors and reviewers in auditing manuscript citations for relevance, accuracy, and ethical appropriateness. The system combines large language models, semantic similarity analysis, and metadata verification to flag potentially problematic citations, achieving moderate agreement with human reviewers in preliminary testing on a pavement engineering manuscript.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers introduce PaperFit, a vision-in-the-loop AI agent that automates the typesetting optimization of LaTeX scientific documents by iteratively rendering pages, diagnosing visual defects, and applying constrained repairs. The work formalizes Visual Typesetting Optimization (VTO) as a critical missing stage in document automation, addressing the gap between compilable but visually flawed PDFs and publication-ready outputs through a new benchmark of 200 papers.
AINeutralarXiv – CS AI · May 16/10
🧠A comprehensive survey examines how large language models can assist or automate peer review processes across academia, synthesizing techniques for review generation, post-review tasks, and evaluation methods. The research catalogs datasets and modeling approaches while addressing ethical concerns and practical implementation challenges for integrating AI into scholarly publishing workflows.
AIBullisharXiv – CS AI · Apr 156/10
🧠Researchers introduce GoodPoint, an AI system trained to generate constructive scientific feedback by learning from author responses to peer review. The method improves feedback quality by 83.7% over baseline models and outperforms larger LLMs like Gemini-3-flash, demonstrating that specialized training on valid, actionable feedback signals yields better results than general-purpose models.
🧠 Gemini
AINeutralarXiv – CS AI · Apr 146/10
🧠Researchers introduced NovBench, the first large-scale benchmark for evaluating how well large language models can assess research novelty in academic papers. The benchmark comprises 1,684 paper-review pairs from a leading NLP conference and reveals that current LLMs struggle with scientific novelty comprehension despite promise in peer review support.
AINeutralarXiv – CS AI · Apr 76/10
🧠Researchers introduce FactReview, an AI system that improves academic peer review by combining claim extraction, literature positioning, and code execution to verify research claims. The system addresses weaknesses in current LLM-based reviewing by grounding assessments in external evidence rather than relying solely on manuscript narratives.
$MKR
AINeutralarXiv – CS AI · Mar 114/10
🧠Researchers propose RbtAct, a novel approach that uses peer review rebuttals as supervision to train AI models for generating more actionable scientific review feedback. The system leverages a new dataset RMR-75K and fine-tuned Llama-3.1-8B model to produce focused, implementable guidance rather than superficial comments.
🧠 Llama