Analytics Digests Sources Topics RSS AI Crypto

#scientific-ai News & Analysis

53 articles tagged with #scientific-ai. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

53 articles

AINeutralarXiv – CS AI · Jun 96/10

🧠

PhysScene: A Scene Graph Dataset for Scientific Visual Reasoning in Physics Experiments

Researchers introduce PhysScene, the first scene graph dataset specifically designed for physics experiments, enabling AI systems to understand complex scientific setups through structured visual reasoning. The dataset prioritizes semantic accuracy and relational density over scale, addressing a gap in domain-specific AI training data for scientific applications.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Plausibility Is Not Prediction: Contrastive Evidence for LLM-Based Cellular Perturbation Reasoning

Researchers demonstrate that large language models fail to accurately predict gene expression changes in cellular perturbation experiments despite producing biologically plausible explanations. They introduce CORE, a contrastive learning method that significantly improves prediction accuracy by organizing evidence from related perturbations rather than evaluating them in isolation.

AINeutralarXiv – CS AI · Jun 26/10

🧠

AblationBench: Evaluating Automated Planning of Ablations in Empirical AI Research

Researchers introduce AblationBench, a benchmark suite for evaluating language model agents on ablation planning tasks in AI research. The study finds that frontier LMs achieve only 45% accuracy on average, significantly below human performance, highlighting challenges in automating scientific research methodologies.

🏢 Hugging Face

AINeutralarXiv – CS AI · Jun 26/10

🧠

Are LLMs Ready for Neural-integrated Mechanistic Modeling? A Benchmark and Agentic Framework

Researchers introduce NIMM, a benchmark for evaluating large language models' ability to construct neural-integrated mechanistic models that combine traditional scientific equations with neural networks. They propose NIMMGen, an agentic framework using tree-guided search that significantly outperforms existing LLM approaches on this complex modeling task across three scientific domains.

AIBullisharXiv – CS AI · Jun 16/10

🧠

PhyDrawGen: Physically Grounded Diagram Generation from Natural Language

PhyDrawGen is a neuro-symbolic AI system that generates physics diagrams from natural language text while maintaining strict physical accuracy. By combining large language models, deterministic solvers, and vision-language models in a pipeline, it overcomes the hallucination problems of current generative models and outperforms GPT-4, Gemini 2.5, and Gemini 3 Pro on physics problems spanning mechanics, optics, and electromagnetism.

🧠 GPT-5🧠 Gemini

AINeutralarXiv – CS AI · May 296/10

🧠

CrystalXRD-Bench: Benchmarking Vision-Language Models for XRD Peak Indexing Across Diverse Crystalline Materials

Researchers introduced CrystalXRD-Bench, a 250-sample benchmark dataset for evaluating vision-language models on crystallographic peak indexing from X-ray diffraction patterns. Despite testing seven leading VLMs, the best model achieved only 37.6% exact-match accuracy, revealing significant gaps in how AI systems handle precise scientific figure interpretation and multi-step reasoning.

🧠 GPT-5

AINeutralarXiv – CS AI · May 296/10

🧠

OmniMatBench: A Human-Calibrated Multimodal Reasoning Benchmark Across 19 Materials Science Subfields

Researchers introduced OmniMatBench, a comprehensive multimodal reasoning benchmark containing 3,171 expert-curated problems across 19 materials science subfields. Evaluation of 13 major language models revealed significant gaps in AI reasoning capabilities, with the best model achieving only 37.2% accuracy, highlighting the need for improved scientific AI systems.

AINeutralarXiv – CS AI · May 296/10

🧠

Predicting Causal Effects from Natural Language Queries using Structured Representations

Researchers introduce Query2Effect, a 72,000-question benchmark for predicting causal effect sizes from natural language queries using LLMs. A two-step framework combining structured representation generation with supervised encoding reduces prediction error by 27-71% compared to standard LLMs, demonstrating that separating semantic interpretation from numerical estimation improves both in-domain performance and out-of-domain generalization.

AINeutralarXiv – CS AI · May 296/10

🧠

CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists

Researchers introduce CausaLab, a benchmarking environment that tests whether LLM agents can both solve causal discovery problems and accurately recover the underlying causal mechanisms. Experiments reveal a significant gap between prediction accuracy (92%) and structural causal model recovery (0.471 F1 score), exposing limitations in current AI systems' ability to perform rigorous scientific reasoning.

🧠 GPT-5

AINeutralarXiv – CS AI · May 296/10

🧠

AtomWorld: A Benchmark for Evaluating Spatial Reasoning in Large Language Models on Crystalline Materials

Researchers introduced AtomWorld, a benchmark for evaluating how well large language models can perform spatial reasoning tasks in materials science, specifically atomic structure manipulation. The study reveals that current LLMs like Claude Opus 4.6 struggle with complex spatial operations, achieving success rates below 12% for rotation tasks, suggesting they function better as collaborative tools than autonomous scientific agents.

🧠 Claude🧠 Opus

AINeutralarXiv – CS AI · May 286/10

🧠

MetaboT: An LLM-based Multi-Agent Frameworkfor Interactive Analysis of Mass SpectrometryMetabolomics Knowledge Graphs

MetaboT is an open-source LLM-based framework that translates natural-language questions into SPARQL queries for metabolomics knowledge graphs, significantly lowering technical barriers for researchers without programming expertise. The multi-agent architecture addresses hallucination and schema-compliance issues through specialized agents for validation, entity resolution, and query refinement, validated on the Experimental Natural Products Knowledge Graph.

AIBullishGoogle DeepMind Blog · May 126/10

🧠

Co-Scientist: A multi-agent AI partner to accelerate research

Google has introduced Co-Scientist, a multi-agent AI system built on Gemini designed to assist researchers in accelerating scientific discovery. The tool represents a significant step in applying large language models to collaborative research workflows, potentially transforming how scientists approach complex problems.

Co-Scientist: A multi-agent AI partner to accelerate research

🧠 Gemini

AINeutralarXiv – CS AI · May 126/10

🧠

ASIA: an Autonomous System Identification Agent

ASIA is an autonomous AI agent framework that automates system identification tasks by delegating model selection, training algorithms, and hyperparameter tuning to a large language model. The framework eliminates manual trial-and-error processes in dynamical systems modeling, though empirical testing reveals concerns around test leakage and reproducibility.

AIBullisharXiv – CS AI · May 126/10

🧠

Can LLMs Predict Polymer Physics Just by Reading Synthesis and Processing Prose?

Researchers introduced PolyLM, a 9-billion-parameter language model that predicts polymer physical and mechanical properties directly from scientific literature without requiring structural chemical data. The model achieved a median R² of 0.74 across 22 diverse properties by training on 185,000 papers and 276,400 polymer samples, demonstrating that natural language processing can effectively capture the experimental context that traditional structure-only models miss.

AINeutralarXiv – CS AI · May 116/10

🧠

Can Agents Price a Reaction? Evaluating LLMs on Chemical Cost Reasoning

Researchers introduce ChemCost, a benchmark for evaluating LLM agents on chemical cost estimation from reaction descriptions. The study reveals that even frontier LLMs achieve only 50.6% accuracy on clean inputs and degrade significantly with realistic noise, exposing brittleness in parsing, evidence integration, and tool use despite access to domain-specific tools.

AINeutralarXiv – CS AI · May 116/10

🧠

LithoBench: Benchmarking Large Multimodal Models for Remote-Sensing Lithology Interpretation

LithoBench introduces a comprehensive benchmark dataset for evaluating large multimodal models on remote-sensing lithology interpretation, containing 10,000 expert-annotated instances across cognitive levels from identification to reasoning. The research reveals significant gaps in current vision-language models' ability to handle knowledge-intensive geological tasks, highlighting the challenges of applying general-purpose AI to specialized domain expertise.

AINeutralarXiv – CS AI · May 116/10

🧠

Adapting Vision-Language Models for Neutrino Event Classification in High-Energy Physics

Researchers have successfully adapted Vision-Language Models (VLMs) based on LLaMA 3.2 to classify neutrino events in high-energy physics detector data, demonstrating that transformer-based architectures outperform traditional CNNs while offering superior interpretability. This work showcases the broader applicability of large multimodal AI models beyond natural language processing to specialized scientific domains.

AINeutralarXiv – CS AI · Apr 146/10

🧠

LABBench2: An Improved Benchmark for AI Systems Performing Biology Research

Researchers have released LABBench2, an upgraded benchmark with nearly 1,900 tasks designed to measure AI systems' real-world capabilities in biology research beyond theoretical knowledge. The new benchmark shows current frontier models achieve 26-46% lower accuracy than on the original LAB-Bench, indicating significant progress in AI scientific abilities while highlighting substantial room for improvement.

$OP🏢 Hugging Face

AINeutralarXiv – CS AI · Apr 146/10

🧠

COMPOSITE-Stem

Researchers introduced COMPOSITE-STEM, a new benchmark containing 70 expert-written scientific tasks across physics, biology, chemistry, and mathematics to evaluate AI agents. The top-performing model achieved only 21% accuracy, indicating the benchmark effectively measures capabilities beyond current AI reach and addresses the saturation of existing evaluation frameworks.

AINeutralarXiv – CS AI · Mar 116/10

🧠

OPENXRD: A Comprehensive Benchmark Framework for LLM/MLLM XRD Question Answering

Researchers introduced OPENXRD, a comprehensive benchmarking framework for evaluating large language models and multimodal LLMs in crystallography question answering. The study tested 74 state-of-the-art models and found that mid-sized models (7B-70B parameters) benefit most from contextual materials, while very large models often show saturation or interference.

🧠 GPT-4🧠 GPT-4.5🧠 GPT-5

AIBullisharXiv – CS AI · Mar 36/108

🧠

MicroVerse: A Preliminary Exploration Toward a Micro-World Simulation

Researchers introduce MicroVerse, a specialized AI video generation model for microscale biological simulations, addressing limitations of current video generation models in scientific applications. The work includes MicroWorldBench benchmark and MicroSim-10K dataset, targeting biomedical applications like drug discovery and educational visualization.

AINeutralarXiv – CS AI · Mar 37/108

🧠

SafeSci: Safety Evaluation of Large Language Models in Science Domains and Beyond

Researchers introduce SafeSci, a comprehensive framework for evaluating safety in large language models used for scientific applications. The framework includes a 0.25M sample benchmark and 1.5M sample training dataset, revealing critical vulnerabilities in 24 advanced LLMs while demonstrating that fine-tuning can significantly improve safety alignment.

AIBearisharXiv – CS AI · Mar 26/1017

🧠

CMT-Benchmark: A Benchmark for Condensed Matter Theory Built by Expert Researchers

Researchers created CMT-Benchmark, a new dataset of 50 expert-level condensed matter theory problems to evaluate large language models' capabilities in advanced scientific research. The best performing model (GPT5) solved only 30% of problems, with the average across 17 models being just 11.4%, highlighting significant gaps in current AI's physical reasoning abilities.

AIBullisharXiv – CS AI · Mar 27/1014

🧠

Carr\'e du champ flow matching: better quality-generalisation tradeoff in generative models

Researchers introduce Carrée du champ flow matching (CDC-FM), a new generative AI model that improves the quality-generalization tradeoff by using geometry-aware noise instead of standard uniform noise. The method shows significant improvements in data-scarce scenarios and non-uniformly sampled datasets, particularly relevant for AI applications in scientific domains.

AIBullisharXiv – CS AI · Mar 27/1019

🧠

VCWorld: A Biological World Model for Virtual Cell Simulation

Researchers have developed VCWorld, a new AI-powered biological simulation system that combines large language models with structured biological knowledge to predict cellular responses to drug perturbations. The system operates as a 'white-box' model, providing interpretable predictions and mechanistic insights while achieving state-of-the-art performance in drug perturbation benchmarks.

← PrevPage 2 of 3Next →

Tag Connections

#geopolitical↔#iran

289

#iran↔#market

210

172

#geopolitical↔#market

142

141

#bitcoin↔#market

114

#fed↔#inflation

104

#iran↔#security

92

84

80

Tag Sentiment

#market1313 articles

#ai1022 articles

#iran840 articles

#geopolitical497 articles

#bitcoin424 articles

#trump317 articles

#security273 articles

#inflation232 articles

#fed205 articles

#trading196 articles

BullishNeutralBearish

◆ AI Mentions

🏢OpenAI

141×

🏢Anthropic

96×

🏢Nvidia

65×

🧠GPT-5

61×

🧠Claude

58×

🧠ChatGPT

32×

🧠Gemini

30×

🏢Meta

25×

🧠Grok

16×

🧠GPT-4

12×

🏢xAI

12×

🏢Hugging Face

11×

🏢Perplexity

9×

🏢Google

8×

🧠Opus

7×

🏢Microsoft

7×

🧠Sonnet

6×

🧠Llama

5×

🧠Copilot

2×

🧠Stable Diffusion

2×

Stay Updated

Everything combined

▲ Trending Tags

1#market1313 2#ai1022 3#iran840 4#geopolitical497 5#bitcoin424 6#trump317 7#security273 8#inflation232 9#fed205 10#trading196 11#adoption150 12#stablecoin146 13#openai141 14#china137 15#ethereum134

Filters

Sentiment

Importance

Sort

📡 See all 70+ sources

y0.exchange

Your AI agent for DeFi

Connect Claude or GPT to your wallet. AI reads balances, proposes swaps and bridges — you approve. Your keys never leave your device.

8 MCP tools · 15 chains · $0 fees

Connect Wallet to AI →How it works →

Viewing: y0 Digest feed