AINeutralarXiv – CS AI · May 126/10
🧠Researchers introduce MaD Physics, a benchmark for evaluating AI agents' ability to conduct scientific discovery under realistic resource constraints. The benchmark tests agents' capacity to make informative measurements within budget limits and infer underlying physical laws, using altered physics environments to prevent reliance on training data.
🧠 Gemini
AINeutralarXiv – CS AI · May 116/10
🧠Researchers introduce POETS, a novel framework that optimizes large language models through compute-efficient policy ensembles while quantifying uncertainty. By leveraging KL-regularized Thompson sampling and shared backbone architectures with independent LoRA branches, POETS achieves superior sample efficiency in scientific discovery tasks while reducing computational overhead compared to traditional ensemble methods.
AINeutralarXiv – CS AI · May 96/10
🧠Researchers conducted a user study with 11 expert mathematicians using AlphaEvolve, an AI coding agent, to explore how humans effectively collaborate with AI systems for scientific discovery. The study identified a cyclical workflow called 'intentmaking'—where users iteratively define and refine experimental goals through system interaction—paired with traditional sensemaking, suggesting AI tools should function as collaborative instruments rather than black-box assistants.
AINeutralarXiv – CS AI · May 96/10
🧠Researchers introduce InciteResearch, a multi-agent AI framework that helps researchers transform vague, implicit research ideas into structured, actionable questions through Socratic questioning. The framework achieves significant improvements over baselines on TF-Bench, a new benchmark for tacit-to-explicit research assistance, demonstrating AI's potential as a thinking tool rather than just an execution automator.
AIBullisharXiv – CS AI · Apr 76/10
🧠Researchers introduce InferenceEvolve, an AI framework using large language models to automatically discover and refine causal inference methods. The system outperformed 58 human submissions in a recent competition and demonstrates how AI can optimize complex scientific programs through evolutionary approaches.
AIBullisharXiv – CS AI · Mar 176/10
🧠Researchers propose ES-Merging, a new framework for combining specialized biological multimodal large language models (MLLMs) by using embedding space signals rather than traditional parameter-based methods. The approach estimates merging coefficients at both layer-wise and element-wise granularities, outperforming existing merging techniques and even task-specific fine-tuned models on cross-modal scientific problems.
AINeutralarXiv – CS AI · Mar 166/10
🧠Researchers introduce Budget-Sensitive Discovery Score (BSDS), a formally verified framework for evaluating AI-guided scientific candidate selection under budget constraints. Testing on drug discovery datasets reveals that simple random forest models outperform large language models, with LLMs providing no marginal value over existing trained classifiers.
AIBullishGoogle DeepMind Blog · Mar 96/10
🧠The article examines the decade-long impact of DeepMind's AlphaGo breakthrough, highlighting how the AI system has influenced scientific discovery across multiple fields from gaming to biology. It explores AlphaGo's role as a catalyst for advancing artificial general intelligence (AGI) research and development.
AIBullisharXiv – CS AI · Mar 96/10
🧠A comprehensive survey examines how large multimodal language models are transforming scientific research across five key areas: literature search, idea generation, content creation, multimodal artifact production, and peer review evaluation. The research highlights both the potential for AI-assisted scientific discovery and the ethical concerns regarding research integrity and misuse of generative models.
AIBullisharXiv – CS AI · Mar 36/109
🧠Researchers developed a method to generate 'alien' research directions by decomposing academic papers into 'idea atoms' and using AI models to identify coherent but non-obvious research paths. The system analyzes ~7,500 machine learning papers to find viable research directions that current researchers are unlikely to naturally propose.
AIBullisharXiv – CS AI · Mar 36/107
🧠Researchers have introduced SciDER, an AI-powered system that automates the entire scientific research process from data analysis to hypothesis generation and code execution. The system uses specialized AI agents that can collaboratively process raw experimental data and outperforms existing general-purpose AI models in scientific discovery tasks.
AIBullisharXiv – CS AI · Mar 36/103
🧠Researchers have developed EDT-Former, an Entropy-guided Dynamic Token Transformer that improves how Large Language Models understand molecular graphs. The system achieves state-of-the-art results on molecular understanding benchmarks while being computationally efficient by avoiding costly LLM backbone fine-tuning.
AINeutralarXiv – CS AI · Feb 274/106
🧠Researchers have introduced LLM4AD, a unified Python platform that leverages large language models for algorithm design across optimization, machine learning, and scientific discovery domains. The platform features modular components, comprehensive evaluation tools, and extensive support resources including tutorials and a graphical user interface to facilitate LLM-assisted algorithm development.
AINeutralGoogle Research Blog · Oct 204/108
🧠Google's Gemini AI is being trained to identify exploding stars (supernovas) using few-shot learning techniques. This demonstrates AI's capability to recognize rare astronomical phenomena with minimal training examples.