🧠 AI⚪ NeutralImportance 6/10

Plausibility Is Not Prediction: Contrastive Evidence for LLM-Based Cellular Perturbation Reasoning

arXiv – CS AI|Xinyu Yuan, Xixian Liu, Jianan Zhao, Yashi Zhang, Hongyu Guo, Jian Tang|June 2, 2026 at 04:00 AM

🤖AI Summary

Researchers demonstrate that large language models fail to accurately predict gene expression changes in cellular perturbation experiments despite producing biologically plausible explanations. They introduce CORE, a contrastive learning method that significantly improves prediction accuracy by organizing evidence from related perturbations rather than evaluating them in isolation.

Analysis

The research reveals a critical gap between plausibility and accuracy in LLM-based biological prediction systems. While these models generate explanations that sound scientifically reasonable, they systematically overestimate differential expression and often underperform simple baseline models, indicating they rely on general gene response patterns rather than understanding perturbation-specific mechanisms. This distinction matters because it exposes a fundamental limitation in how knowledge-driven AI systems process biological evidence.

The problem stems from methodology: existing approaches evaluate each perturbation-gene pair independently, preventing models from learning how similar perturbations produce different outcomes on the same gene. The CORE framework addresses this by framing prediction as a comparative task, using biomedical knowledge graphs to present both positive and negative examples from related experiments. Results demonstrate substantial improvements—up to 28.6% on drug-perturbation data and raising per-gene AUROC from chance to 0.703 across cell lines.

This research has implications for computational biology and AI development broadly. In drug discovery and precision medicine, accurate perturbation prediction could reduce costly experimental validation. The findings also highlight how prompt design and evidence organization fundamentally shape AI reasoning capabilities, extending beyond biology into other domains requiring causal inference from sparse data. The work suggests that future LLM applications in scientific domains require architectural changes prioritizing contrastive reasoning rather than isolated analysis.

Key Takeaways

→LLMs produce biologically plausible but inaccurate perturbation predictions, conflating general gene response patterns with true mechanistic understanding.
→CORE's contrastive evidence approach improves prediction accuracy by up to 28.6% by organizing evidence from related perturbations rather than evaluating pairs in isolation.
→Current evaluation methods masked model failures because biologically plausible explanations don't guarantee predictive accuracy for unobserved conditions.
→The research demonstrates that evidence organization architecture critically influences LLM reasoning quality in scientific prediction tasks.
→Contrastive learning frameworks could enhance LLM performance across domains requiring causal inference from limited experimental data.

#llm-reasoning #cellular-prediction #contrastive-learning #biomedical-ai #gene-expression #perturbation-analysis #knowledge-graphs #scientific-ai

Read Original →via arXiv – CS AI

Act on this with AI

Stay ahead of the market.

Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.

Connect Wallet to AI →How it works

AIMay 6

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

AIMay 6

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

AIMay 6

Plausibility Is Not Prediction: Contrastive Evidence for LLM-Based Cellular Perturbation Reasoning

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

S&P 500 and NASDAQ hit record highs as AI chip stocks surge