AINeutralarXiv – CS AI · 4d ago7/10
🧠Researchers prove that large language models fundamentally cannot perform causal discovery through standard training methods, establishing this limitation as intrinsic to supervised learning rather than a model-specific flaw. They propose Agentic Causal Bayesian Optimization (A-CBO), which bypasses this constraint by using frozen language models as query oracles within an external optimization loop, achieving superior performance on causal inference benchmarks.
AINeutralarXiv – CS AI · May 117/10
🧠Researchers introduce a scenario-grounded benchmark for evaluating large language models in scientific discovery, revealing significant performance gaps compared to general science benchmarks. The framework tests LLMs across biology, chemistry, materials, and physics through project-level tasks involving hypothesis generation and experimental design, showing that current models remain distant from achieving general scientific superintelligence despite demonstrating promise in specific applications.
AIBullisharXiv – CS AI · May 117/10
🧠ATHENA is an autonomous AI framework that automates scientific computing and machine learning research by autonomously selecting mathematical approaches, generating code, and iteratively improving solutions through a contextual bandit learning process. The system achieves validation errors as low as 10^-14 and demonstrates performance surpassing traditional foundation models in solving complex multiphysics problems.
AIBullisharXiv – CS AI · May 97/10
🧠Researchers present AI CFD Scientist, an open-source AI agent framework that autonomously conducts computational fluid dynamics research by combining literature review, physics simulation, vision-based verification, and manuscript generation. The system demonstrates measurable improvements in turbulence modeling and detects failure modes that traditional solver checks miss, representing a significant step toward AI-driven scientific discovery in high-fidelity physical simulation.
🧠 GPT-5
AIBullisharXiv – CS AI · May 77/10
🧠Researchers propose Experiment-as-Code (EaC) Labs, a new paradigm that bridges AI agents with physical laboratory equipment by encoding experiments as declarative configurations compiled to device-level APIs. This framework combines artificial intelligence with automated lab instrumentation through a systems layer that performs safety checks, resource allocation, and job orchestration, enabling AI-driven scientific discovery beyond purely digital environments.
AIBullisharXiv – CS AI · May 17/10
🧠Researchers introduce Intern-Atlas, a methodological evolution graph built from over 1 million AI papers that automatically maps how research methods develop and relate to one another. The infrastructure captures explicit causal relationships between methodologies and enables AI-driven research agents to reconstruct innovation timelines, addressing a critical gap in existing document-centric research systems.
AIBullisharXiv – CS AI · May 17/10
🧠Researchers introduce machine collective intelligence, a paradigm combining symbolic reasoning and metaheuristics to autonomously discover governing equations from empirical data. The approach recovers underlying equations across deterministic, stochastic, and uncharacterized systems while reducing extrapolation error by up to six orders of magnitude compared to deep neural networks and condensing millions of parameters into just 5-40 interpretable ones.
AINeutralarXiv – CS AI · Apr 207/10
🧠Researchers introduced PRL-Bench, a comprehensive benchmark measuring large language models' ability to conduct autonomous physics research across five subfields. Testing frontier AI models revealed performance below 50%, exposing a significant capability gap between current LLMs and the demands of real-world scientific discovery.
AIBullisharXiv – CS AI · Apr 147/10
🧠Researchers introduce GIANTS, a framework for training language models to anticipate scientific breakthroughs by synthesizing insights from foundational papers. The team releases GiantsBench, a 17k-example benchmark across eight scientific domains, and GIANTS-4B, a 4B-parameter model that outperforms larger proprietary baselines by 34% while generalizing to unseen research areas.
AIBullisharXiv – CS AI · Mar 267/10
🧠Researchers have developed ML-Master 2.0, an autonomous AI agent that achieves breakthrough performance in ultra-long-horizon machine learning tasks by using Hierarchical Cognitive Caching architecture. The system achieved a 56.44% medal rate on OpenAI's MLE-Bench, demonstrating the ability to maintain strategic coherence over experimental cycles spanning days or weeks.
🏢 OpenAI
AIBullisharXiv – CS AI · Mar 177/10
🧠An NSF workshop community paper outlines strategic priorities for strengthening the intersection between artificial intelligence and mathematical/physical sciences (AI+MPS). The report proposes three key activities: enabling bidirectional AI+MPS research, building interdisciplinary communities, and fostering education and workforce development in this rapidly evolving field.
AIBullisharXiv – CS AI · Mar 57/10
🧠Researchers introduced AI4S-SDS, a neuro-symbolic framework combining multi-agent collaboration with Monte Carlo Tree Search for automated chemical formulation design. The system addresses LLM limitations in materials science applications and successfully identified a novel photoresist developer formulation that matches commercial benchmarks in preliminary lithography experiments.
AINeutralarXiv – CS AI · Mar 57/10
🧠Researchers introduce MACC (Multi-Agent Collaborative Competition), a new institutional architecture that combines multiple AI agents based on large language models to improve scientific discovery. The system addresses limitations of single-agent approaches by incorporating incentive mechanisms, shared workspaces, and institutional design principles to enhance transparency, reproducibility, and exploration efficiency in scientific research.
AIBullisharXiv – CS AI · Mar 37/102
🧠Researchers have developed FM Agent, a multi-agent AI framework that combines large language models with evolutionary search to autonomously solve complex research problems. The system achieved state-of-the-art results across multiple domains including operations research, machine learning, and GPU optimization without human intervention.
AIBullisharXiv – CS AI · Feb 277/106
🧠Researchers have developed a new framework that uses large language models to guide symbolic regression in discovering interpretable physical laws from high-dimensional materials data. The method reduces the search space by approximately 10^5 times compared to traditional approaches and successfully identified novel formulas for key properties of perovskite materials.
AIBullishOpenAI News · Feb 137/106
🧠OpenAI's GPT-5.2 has independently derived a new mathematical formula for gluon amplitude in theoretical physics, which was subsequently formally proved and verified by OpenAI and academic collaborators. This represents a significant advancement in AI's capability to contribute to fundamental scientific research and discovery.
AIBullishGoogle DeepMind Blog · Feb 97/105
🧠Google's Gemini Deep Think is demonstrating significant impact across mathematical and scientific research fields according to emerging research papers. The AI system is accelerating discovery processes in various academic and research domains.
AIBullishMIT News – AI · Feb 27/108
🧠MIT researchers developed DiffSyn, a generative AI model that provides recipes for synthesizing new materials. This breakthrough could accelerate scientific experimentation by reducing the time from hypothesis to practical application.
AIBullishGoogle DeepMind Blog · Nov 247/105
🧠Google DeepMind has partnered with the U.S. Department of Energy on Genesis, a new national initiative designed to accelerate scientific discovery and innovation through artificial intelligence. This collaboration represents a significant government-private sector partnership in advancing AI applications for scientific research.
AIBullishOpenAI News · Nov 207/106
🧠OpenAI has released the first research cases demonstrating how GPT-5 accelerates scientific discovery across mathematics, physics, biology, and computer science. The AI system is shown collaborating with researchers to generate mathematical proofs, uncover new insights, and significantly increase the pace of scientific progress.
AIBullishGoogle DeepMind Blog · May 176/10
🧠Google has launched Gemini for Science, a collection of AI-powered tools and experiments designed to accelerate scientific discovery and research across multiple disciplines. The initiative aims to enhance the scale and precision of scientific exploration by leveraging advanced AI capabilities.
🧠 Gemini
AINeutralarXiv – CS AI · May 126/10
🧠Researchers introduce MaD Physics, a benchmark for evaluating AI agents' ability to conduct scientific discovery under realistic resource constraints. The benchmark tests agents' capacity to make informative measurements within budget limits and infer underlying physical laws, using altered physics environments to prevent reliance on training data.
🧠 Gemini
AIBearisharXiv – CS AI · May 126/10
🧠A new position paper argues that despite functioning as useful co-scientists, agentic AI systems are fundamentally not designed for truly autonomous scientific discovery due to challenges in problem selection bias, insufficient tacit knowledge in training data, compressed output diversity, and lack of real-world experimental feedback loops.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers introduce POETS, a novel framework that optimizes large language models through compute-efficient policy ensembles while quantifying uncertainty. By leveraging KL-regularized Thompson sampling and shared backbone architectures with independent LoRA branches, POETS achieves superior sample efficiency in scientific discovery tasks while reducing computational overhead compared to traditional ensemble methods.
AINeutralarXiv – CS AI · May 96/10
🧠Researchers conducted a user study with 11 expert mathematicians using AlphaEvolve, an AI coding agent, to explore how humans effectively collaborate with AI systems for scientific discovery. The study identified a cyclical workflow called 'intentmaking'—where users iteratively define and refine experimental goals through system interaction—paired with traditional sensemaking, suggesting AI tools should function as collaborative instruments rather than black-box assistants.