AINeutralarXiv – CS AI · May 16/10
🧠A comprehensive survey examines how large language models can assist or automate peer review processes across academia, synthesizing techniques for review generation, post-review tasks, and evaluation methods. The research catalogs datasets and modeling approaches while addressing ethical concerns and practical implementation challenges for integrating AI into scholarly publishing workflows.
AINeutralarXiv – CS AI · Apr 206/10
🧠Researchers compared large language models with human responses in a behavioral study on accuracy perception, finding that LLMs reproduce directional effects but with inconsistent effect magnitudes across different models. The study reveals that off-the-shelf LLMs can simulate some human belief-updating patterns in controlled experiments but lack reliable human-scale accuracy, establishing clearer boundaries for when synthetic LLM data is appropriate for behavioral research.
AIBearisharXiv – CS AI · Apr 206/10
🧠A new study reveals that using large language models to generate synthetic datasets ("silicon samples") produces highly variable results depending on configuration choices, with correlation outcomes ranging from r=.23 to r=.84 on the same task. This demonstrates that analytic flexibility in LLM-based data generation poses a significant threat to research validity and reproducibility in social science applications.
AINeutralarXiv – CS AI · Apr 146/10
🧠Researchers introduce SciPredict, a benchmark testing whether large language models can predict scientific experiment outcomes across physics, biology, and chemistry. The study reveals that while some frontier models marginally exceed human experts (~20% accuracy), they fundamentally fail to assess prediction reliability, suggesting superhuman performance in experimental science requires not just better predictions but better calibration awareness.
AINeutralarXiv – CS AI · Apr 146/10
🧠Researchers propose AI as a Research Object (AI-RO), a governance framework that treats generative AI interactions as inspectable, documented components of scientific research rather than debating authorship. The framework combines interaction logs, metadata packaging, and provenance records to ensure accountability, particularly for security and privacy research where confidentiality and auditability are critical.
🏢 Meta
AIBearisharXiv – CS AI · Mar 36/106
🧠Researchers compared human survey responses from 420 Silicon Valley developers with synthetic data from five leading LLMs including ChatGPT, Claude, and Gemini. While AI models produced technically plausible results, they failed to capture counterintuitive insights and only replicated conventional wisdom rather than revealing novel findings.
AIBullisharXiv – CS AI · Mar 36/103
🧠Researchers introduce ScholarEval, a retrieval-augmented framework for evaluating AI-generated research ideas based on soundness and contribution metrics. The system outperformed OpenAI's o1-mini-deep-research baseline across multiple evaluation criteria in testing with 117 expert-annotated research ideas across four scientific disciplines.
AIBullisharXiv – CS AI · Feb 275/104
🧠Researchers conducted a comprehensive review of artificial intelligence applications in life cycle assessment (LCA) using large language models to analyze trends and patterns. The study found dramatic growth in AI adoption for environmental assessments, with a notable shift toward LLM-driven approaches and strong correlations between AI methods and LCA stages.
AINeutralarXiv – CS AI · Feb 276/106
🧠Researchers published a case study demonstrating successful human-AI collaboration in mathematical research, extending Hermite quadrature rule results beyond manual capabilities. The study reveals AI's strengths in algebraic manipulation and proof exploration, while highlighting the critical need for human verification and domain expertise in every step of the research process.
AINeutralarXiv – CS AI · Mar 44/103
🧠Researchers developed a framework using large language models to simulate virtual respondents for validating psychometric survey items, addressing the challenge of ensuring construct validity without costly human data collection. The approach uses trait-response mediators to identify survey items that robustly measure intended psychological traits across three major trait theories.
AINeutralOpenAI News · May 24/104
🧠The article provides a deeper analysis of previous findings related to sycophancy issues, examining what went wrong in their initial assessment. It outlines future changes and improvements the organization plans to implement based on their expanded understanding.