#scientific-ai News & Analysis

53 articles tagged with #scientific-ai. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

53 articles

AIBullisharXiv – CS AI · Jun 257/10

🧠

CauScale: Neural Causal Discovery at Scale

CauScale is a neural architecture that dramatically advances causal discovery—a critical capability for scientific AI and data analysis—by enabling efficient processing of graphs with up to 1,000 nodes. The system achieves 99.6% accuracy on standard benchmarks while delivering 4-13,000x faster inference than existing methods, solving long-standing computational bottlenecks that previously limited causal discovery to smaller datasets.

AIBullishCrypto Briefing · Jun 247/10

🧠

Mirendil secures $200M seed round led by a16z and Nvidia

Mirendil has secured a $200M seed round led by prominent investors a16z and Nvidia, underscoring significant institutional confidence in AI-driven scientific research. The funding round reflects a broader market shift toward companies leveraging artificial intelligence for breakthrough scientific discoveries and applications.

🏢 Nvidia

AIBullisharXiv – CS AI · Jun 237/10

🧠

ARIA: A Causal-Aware Framework for Rescuing LLM Reasoning in Trustworthy Materials Discovery

Researchers introduce ARIA, a causal-aware framework that improves how Large Language Models reason about materials discovery by addressing 'contextual tunneling'—a bias where models over-rely on narrow retrieved evidence. ARIA uses a three-tier approach combining direct causal reasoning, physics-informed analogies, and parametric fallbacks, validated on a knowledge graph of 2,839 materials relations, enabling more trustworthy and auditable AI-assisted scientific discovery.

AIBullisharXiv – CS AI · Jun 237/10

🧠

GyroSwin: 5D Surrogates for Gyrokinetic Plasma Turbulence Simulations

Researchers introduce GyroSwin, a neural surrogate model that simulates 5D gyrokinetic plasma turbulence with 1000x computational efficiency while capturing nonlinear physics effects. This breakthrough combines hierarchical Vision Transformers with cross-attention mechanisms to predict turbulent heat transport more accurately than traditional reduced-order models, advancing nuclear fusion energy research.

AIBullisharXiv – CS AI · Jun 107/10

🧠

MMClima: A Framework for Multimodal Climate Science Data and Evaluation

Researchers introduce MMClima, a large-scale multimodal framework containing 104k+ expert-validated QA pairs for climate science across text, video, and figures. The project benchmarks state-of-the-art multimodal AI models and releases a fine-tuned baseline model, evaluation tools, and dataset to standardize climate science AI evaluation.

AINeutralarXiv – CS AI · Jun 97/10

🧠

ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research

Researchers introduced ResearchClawBench, a comprehensive benchmark with 40 tasks across 10 scientific domains designed to evaluate AI agents' ability to conduct autonomous scientific research. Current leading systems like Claude Code and Claude-Opus-4 score only 20-21.5 points, revealing significant gaps in experimental design, evidence synthesis, and scientific reasoning capabilities.

🧠 Claude

AIBullishCrypto Briefing · Jun 77/10

🧠

Anthropic’s Claude Opus 4.7 matches dedicated NMR software in chemistry tasks

Anthropic's Claude Opus 4.7 AI model has demonstrated performance comparable to dedicated NMR (nuclear magnetic resonance) software in chemistry analysis tasks. This development could streamline chemical research workflows by reducing dependency on specialized, expensive software tools and proprietary datasets.

🏢 Anthropic🧠 Claude🧠 Opus

AIBullisharXiv – CS AI · Jun 47/10

🧠

Who Needs Labels? Adapting Vision Foundation Models With the Metadata You Already Have

Researchers propose FINO, a label-free method for adapting vision foundation models to specialized scientific domains using existing metadata rather than expensive labeled datasets. The approach combines self-supervised learning with metadata guidance, demonstrating superior performance across microscopy, Earth observation, and medical imaging compared to both unsupervised and fully supervised alternatives.

AIBullisharXiv – CS AI · May 287/10

🧠

AIBuildAI-2: A Knowledge-Enhanced Agent for Automatically Building AI Models

AIBuildAI-2 introduces a knowledge-enhanced AI agent that automatically builds machine learning models by combining large language models with an external, evolving knowledge system. The system achieves state-of-the-art performance, ranking first on MLE-Bench and placing in the top 6.6% of human teams in a predictive competition, democratizing AI model development for non-specialists.

AIBullisharXiv – CS AI · May 287/10

🧠

MolLingo: Molecule-Native Representations for LLM-Powered Scientific Agents

Researchers introduce MolLingo, a multi-agent AI system that automates molecular design by coordinating specialized agents through shared memory and domain-specific tools. The system uses BRICS-based Fragment Enumeration to represent molecules in chemically meaningful ways that LLMs can reason about effectively, achieving superior performance on drug design benchmarks compared to frontier models like GPT-5.

🧠 GPT-5

AIBullisharXiv – CS AI · May 277/10

🧠

AutoDFT: A Closed-Loop Multi-Agent Framework for Autonomous DFT Calculations

AutoDFT is a closed-loop multi-agent framework that automates density functional theory (DFT) calculations by embedding LLM reasoning throughout the entire computational lifecycle, rather than just the planning phase. The system achieves 94.1% success on a 34-task benchmark and enables non-experts to obtain reliable computational chemistry results by dynamically adapting to failures and unexpected outcomes.

🧠 GPT-5

AIBullisharXiv – CS AI · May 277/10

🧠

ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence

ScientistOne introduces Chain-of-Evidence, a verifiability framework addressing critical failures in autonomous research systems where AI agents produce plausible-looking but unreliable outputs including fabricated citations, unverified scores, and misaligned methods. The system achieves zero hallucinated references and perfect score verification across five research tasks, significantly outperforming existing baseline systems that exhibit systematic failure rates up to 80%.

AIBullishMIT Technology Review · May 227/10

🧠

Google I/O showed how the path for AI-driven science is shifting

During Google I/O, DeepMind CEO Demis Hassabis stated we are approaching the "singularity," signaling that AI-driven scientific advancement is accelerating rapidly. The keynote highlighted Google's positioning of AI as a transformative force for research and development across industries.

🏢 Google

AIBearisharXiv – CS AI · May 127/10

🧠

MDGYM: Benchmarking AI Agents on Molecular Simulations

Researchers introduced MDGYM, a benchmark testing AI agents' ability to autonomously execute molecular dynamics simulations, finding that even the strongest systems solve only 21% of easy tasks. The poor performance reveals that advanced code generation does not translate to physical reasoning, exposing a critical gap between general software engineering competence and domain-specific scientific workflows.

🧠 Claude

AIBullisharXiv – CS AI · Apr 107/10

🧠

Scientific Knowledge-driven Decoding Constraints Improving the Reliability of LLMs

Researchers propose SciDC, a method that constrains large language model outputs using subject-specific scientific rules to reduce hallucinations and improve reliability. The approach demonstrates 12% average accuracy improvements across domain tasks including drug formulation, clinical diagnosis, and chemical synthesis planning.

AIBullisharXiv – CS AI · Mar 117/10

🧠

Logos: An evolvable reasoning engine for rational molecular design

Researchers introduce Logos, a compact AI model that combines multi-step logical reasoning with chemical consistency for molecular design. The model achieves strong performance in structural accuracy and chemical validity while using fewer parameters than larger language models, and provides transparent reasoning that can be inspected by humans.

AIBullisharXiv – CS AI · Mar 57/10

🧠

MMAI Gym for Science: Training Liquid Foundation Models for Drug Discovery

Researchers introduce MMAI Gym for Science, a training framework for molecular foundation models in drug discovery. Their Liquid Foundation Model (LFM) outperforms larger general-purpose models on drug discovery tasks while being more efficient and specialized for molecular applications.

AIBullisharXiv – CS AI · Mar 57/10

🧠

Phi-4-reasoning-vision-15B Technical Report

Researchers released Phi-4-reasoning-vision-15B, a compact open-weight multimodal AI model that combines vision and language capabilities with strong performance in scientific and mathematical reasoning. The model demonstrates that careful architecture design and high-quality data curation can enable smaller models to achieve competitive performance with less computational resources.

AINeutralarXiv – CS AI · Mar 57/10

🧠

Can Large Language Models Derive New Knowledge? A Dynamic Benchmark for Biological Knowledge Discovery

Researchers have developed DBench-Bio, a dynamic benchmark system that automatically evaluates AI's ability to discover new biological knowledge using a three-stage pipeline of data acquisition, question-answer extraction, and quality filtering. The benchmark addresses the critical problem of data contamination in static datasets and provides monthly updates across 12 biomedical domains, revealing current limitations in state-of-the-art AI models' knowledge discovery capabilities.

AIBullishGoogle DeepMind Blog · Oct 97/105

🧠

Demis Hassabis & John Jumper awarded Nobel Prize in Chemistry

Demis Hassabis and John Jumper have been awarded the Nobel Prize in Chemistry for developing AlphaFold, an AI system that predicts 3D protein structures from amino acid sequences. This recognition highlights the transformative impact of AI in scientific research and drug discovery.

AINeutralarXiv – CS AI · Jun 256/10

🧠

Geo-Strat-RL: Learning Geological Event Reasoning from Verifiable Tasks

Researchers present Geo-Strat-RL, a synthetic environment that trains vision-language models to reason about geological histories through reinforcement learning with verifiable rewards. The system demonstrates that geological reasoning learned from stratigraphic diagrams can transfer to seismic data without domain-specific training, suggesting AI models can learn generalizable geological principles across different observation formats.

AINeutralarXiv – CS AI · Jun 116/10

🧠

Sparse probes and murky physics: a case study of interpretability challenges in a foundation model for continuum dynamics

Researchers applied mechanistic interpretability techniques to Walrus, a foundation model for continuum dynamics, using sparse autoencoders to probe internal mechanisms. The study reveals inconsistent feature alignment with known physics and systematic discrepancies in model outputs, highlighting fundamental challenges in understanding and validating scientific AI systems.

AINeutralarXiv – CS AI · Jun 106/10

🧠

SocraticPO: Policy Optimization via Interactive Guidance

SocraticPO is a new reinforcement learning framework that improves large language model training by combining natural-language teacher guidance with reward decay, rather than relying solely on scalar outcome rewards. The method shows improvements on scientific reasoning benchmarks while preventing models from exploiting teacher assistance as a shortcut to rewards.

AINeutralarXiv – CS AI · Jun 96/10

🧠

DN-Hypo-Pipeline: An AI-Driven Workflow for Hypothesis Generation via Large Language Models and Scientific Explanations

Researchers introduce DN-Hypo-Pipeline, an AI workflow leveraging large language models to automate scientific hypothesis generation from existing research literature. The system reconstructs novel explanations for observed phenomena and was validated in data science modeling, with two generated hypotheses producing algorithms that outperformed baseline models from the original papers.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Graph2Idea:Retrieval-Augmented Scientific Idea Generation with Graph-Structured Contexts

Researchers propose Graph2Idea, an AI framework that uses knowledge graphs to improve scientific idea generation by converting retrieved papers into structured knowledge relationships rather than flat text. The method demonstrates significant improvements in novelty, quality, and feasibility of generated research ideas compared to existing LLM-based approaches.

Page 1 of 3Next →