AINeutralarXiv – CS AI · 4d ago7/10
🧠Researchers prove that large language models fundamentally cannot perform causal discovery through standard training methods, establishing this limitation as intrinsic to supervised learning rather than a model-specific flaw. They propose Agentic Causal Bayesian Optimization (A-CBO), which bypasses this constraint by using frozen language models as query oracles within an external optimization loop, achieving superior performance on causal inference benchmarks.
AIBullisharXiv – CS AI · May 117/10
🧠ATHENA is an autonomous AI framework that automates scientific computing and machine learning research by autonomously selecting mathematical approaches, generating code, and iteratively improving solutions through a contextual bandit learning process. The system achieves validation errors as low as 10^-14 and demonstrates performance surpassing traditional foundation models in solving complex multiphysics problems.
AINeutralarXiv – CS AI · May 117/10
🧠Researchers introduce a scenario-grounded benchmark for evaluating large language models in scientific discovery, revealing significant performance gaps compared to general science benchmarks. The framework tests LLMs across biology, chemistry, materials, and physics through project-level tasks involving hypothesis generation and experimental design, showing that current models remain distant from achieving general scientific superintelligence despite demonstrating promise in specific applications.
AIBullisharXiv – CS AI · May 97/10
🧠Researchers present AI CFD Scientist, an open-source AI agent framework that autonomously conducts computational fluid dynamics research by combining literature review, physics simulation, vision-based verification, and manuscript generation. The system demonstrates measurable improvements in turbulence modeling and detects failure modes that traditional solver checks miss, representing a significant step toward AI-driven scientific discovery in high-fidelity physical simulation.
🧠 GPT-5
AIBullisharXiv – CS AI · May 77/10
🧠Researchers propose Experiment-as-Code (EaC) Labs, a new paradigm that bridges AI agents with physical laboratory equipment by encoding experiments as declarative configurations compiled to device-level APIs. This framework combines artificial intelligence with automated lab instrumentation through a systems layer that performs safety checks, resource allocation, and job orchestration, enabling AI-driven scientific discovery beyond purely digital environments.
AIBullisharXiv – CS AI · May 17/10
🧠Researchers introduce machine collective intelligence, a paradigm combining symbolic reasoning and metaheuristics to autonomously discover governing equations from empirical data. The approach recovers underlying equations across deterministic, stochastic, and uncharacterized systems while reducing extrapolation error by up to six orders of magnitude compared to deep neural networks and condensing millions of parameters into just 5-40 interpretable ones.
AIBullisharXiv – CS AI · May 17/10
🧠Researchers introduce Intern-Atlas, a methodological evolution graph built from over 1 million AI papers that automatically maps how research methods develop and relate to one another. The infrastructure captures explicit causal relationships between methodologies and enables AI-driven research agents to reconstruct innovation timelines, addressing a critical gap in existing document-centric research systems.
AINeutralarXiv – CS AI · Apr 207/10
🧠Researchers introduced PRL-Bench, a comprehensive benchmark measuring large language models' ability to conduct autonomous physics research across five subfields. Testing frontier AI models revealed performance below 50%, exposing a significant capability gap between current LLMs and the demands of real-world scientific discovery.
AIBullisharXiv – CS AI · Apr 147/10
🧠Researchers introduce GIANTS, a framework for training language models to anticipate scientific breakthroughs by synthesizing insights from foundational papers. The team releases GiantsBench, a 17k-example benchmark across eight scientific domains, and GIANTS-4B, a 4B-parameter model that outperforms larger proprietary baselines by 34% while generalizing to unseen research areas.
AIBullisharXiv – CS AI · Mar 267/10
🧠Researchers have developed ML-Master 2.0, an autonomous AI agent that achieves breakthrough performance in ultra-long-horizon machine learning tasks by using Hierarchical Cognitive Caching architecture. The system achieved a 56.44% medal rate on OpenAI's MLE-Bench, demonstrating the ability to maintain strategic coherence over experimental cycles spanning days or weeks.
🏢 OpenAI
AIBullisharXiv – CS AI · Mar 177/10
🧠An NSF workshop community paper outlines strategic priorities for strengthening the intersection between artificial intelligence and mathematical/physical sciences (AI+MPS). The report proposes three key activities: enabling bidirectional AI+MPS research, building interdisciplinary communities, and fostering education and workforce development in this rapidly evolving field.
AIBullisharXiv – CS AI · Mar 57/10
🧠Researchers introduced AI4S-SDS, a neuro-symbolic framework combining multi-agent collaboration with Monte Carlo Tree Search for automated chemical formulation design. The system addresses LLM limitations in materials science applications and successfully identified a novel photoresist developer formulation that matches commercial benchmarks in preliminary lithography experiments.
AINeutralarXiv – CS AI · Mar 57/10
🧠Researchers introduce MACC (Multi-Agent Collaborative Competition), a new institutional architecture that combines multiple AI agents based on large language models to improve scientific discovery. The system addresses limitations of single-agent approaches by incorporating incentive mechanisms, shared workspaces, and institutional design principles to enhance transparency, reproducibility, and exploration efficiency in scientific research.
AIBullisharXiv – CS AI · Mar 37/102
🧠Researchers have developed FM Agent, a multi-agent AI framework that combines large language models with evolutionary search to autonomously solve complex research problems. The system achieved state-of-the-art results across multiple domains including operations research, machine learning, and GPU optimization without human intervention.
AIBullisharXiv – CS AI · Feb 277/106
🧠Researchers have developed a new framework that uses large language models to guide symbolic regression in discovering interpretable physical laws from high-dimensional materials data. The method reduces the search space by approximately 10^5 times compared to traditional approaches and successfully identified novel formulas for key properties of perovskite materials.
AIBullishOpenAI News · Feb 137/106
🧠OpenAI's GPT-5.2 has independently derived a new mathematical formula for gluon amplitude in theoretical physics, which was subsequently formally proved and verified by OpenAI and academic collaborators. This represents a significant advancement in AI's capability to contribute to fundamental scientific research and discovery.
AIBullishGoogle DeepMind Blog · Feb 97/105
🧠Google's Gemini Deep Think is demonstrating significant impact across mathematical and scientific research fields according to emerging research papers. The AI system is accelerating discovery processes in various academic and research domains.
AIBullishMIT News – AI · Feb 27/108
🧠MIT researchers developed DiffSyn, a generative AI model that provides recipes for synthesizing new materials. This breakthrough could accelerate scientific experimentation by reducing the time from hypothesis to practical application.
AIBullishGoogle DeepMind Blog · Nov 247/105
🧠Google DeepMind has partnered with the U.S. Department of Energy on Genesis, a new national initiative designed to accelerate scientific discovery and innovation through artificial intelligence. This collaboration represents a significant government-private sector partnership in advancing AI applications for scientific research.
AIBullishOpenAI News · Nov 207/106
🧠OpenAI has released the first research cases demonstrating how GPT-5 accelerates scientific discovery across mathematics, physics, biology, and computer science. The AI system is shown collaborating with researchers to generate mathematical proofs, uncover new insights, and significantly increase the pace of scientific progress.
AINeutralarXiv – CS AI · 10h ago6/10
🧠Researchers introduce Auto-Discovery-Bench, a diagnostic benchmark that tests AI agents' ability to maintain and update structured beliefs through iterative hypothesis-intervention-feedback cycles. The benchmark reveals that performance degrades significantly with increased complexity variables, and identifies limitations in long-range structured information integration as a key bottleneck for scientific discovery agents.
AINeutralarXiv – CS AI · 3d ago6/10
🧠MOOSE-Copilot introduces a unified framework for scientific hypothesis discovery that combines exploratory ideation with fine-grained refinement through structured human-AI interaction. The web-based system enables scientists to guide LLM-powered discovery processes via initial blueprints, routing decisions, and feedback mechanisms, outperforming autonomous baselines while lowering accessibility barriers through an intuitive visual interface.
🏢 Microsoft
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers introduce ProjectionBench, a novel evaluation framework that tests large language models' scientific discovery capabilities by progressively revealing information about research problems. The benchmark assesses both innovative reasoning with minimal context and grounded hypothesis generation with full experimental details across 45 materials science papers, finding that GPT-5.4 and Gemini 3.1 Pro achieve strong alignment with ground-truth conclusions.
🧠 GPT-5🧠 Gemini
AINeutralarXiv – CS AI · 3d ago6/10
🧠Researchers introduce Influence-Guided Symbolic Regression (IGSR), a novel framework combining LLMs with Monte Carlo Tree Search to discover scientific equations more efficiently. The method uses granular influence scores to evaluate which components of equations contribute to accuracy, enabling systematic refinement. The approach demonstrated genuine discovery potential by identifying a novel relationship between DNA methylation and RNA Polymerase II pausing that was subsequently validated experimentally.
AIBullishGoogle DeepMind Blog · May 176/10
🧠Google has launched Gemini for Science, a collection of AI-powered tools and experiments designed to accelerate scientific discovery and research across multiple disciplines. The initiative aims to enhance the scale and precision of scientific exploration by leveraging advanced AI capabilities.
🧠 Gemini