#biological-ai News & Analysis

13 articles tagged with #biological-ai. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

13 articles

AIBullisharXiv – CS AI · Jun 237/10

🧠

BioMatrix: Towards a Comprehensive Biological Foundation Model Spanning the Modality Matrix of Sequences, Structures, and Language

Researchers introduce BioMatrix, a multimodal foundation model that integrates molecular sequences, structures, protein data, and natural language within a single decoder-only architecture. The model achieves state-of-the-art performance on 77 of 80 downstream tasks, demonstrating that a unified generalist AI can match or exceed specialized biological tools across diverse applications.

AIBearisharXiv – CS AI · Jun 47/10

🧠

Retrieval and competition: how a protein foundation model starts a protein

Researchers traced how ESM2-8M, a protein language model, predicts that proteins begin with methionine—a near-universal biological rule. The analysis reveals the model doesn't recognize methionine through direct evidence detection, but rather retrieves it via a distributed computational circuit anchored at the sequence start token. Critically, the model fails on sequences where biology diverges from the statistical default, suggesting that model confidence may not reflect genuine biological understanding.

AINeutralarXiv – CS AI · May 297/10

🧠

BioArc: Discovering Optimal Neural Architectures for Biological Foundation Models

BioArc introduces a neural architecture search framework that systematically discovers optimal model architectures for biological foundation models, moving beyond generic adaptation of NLP and computer vision models. The research identifies design principles and proposes methods to predict architectures for new biological tasks, providing foundational methodology for next-generation biology-focused AI systems.

AIBullisharXiv – CS AI · Mar 97/10

🧠

Predictive Coding Networks and Inference Learning: Tutorial and Survey

Researchers present a comprehensive survey of Predictive Coding Networks (PCNs), a neuroscience-inspired AI approach that uses biologically plausible inference learning instead of traditional backpropagation. PCNs can achieve higher computational efficiency with parallelization and offer a more versatile framework for both supervised and unsupervised learning compared to traditional neural networks.

AINeutralarXiv – CS AI · Mar 57/10

🧠

Inference-Time Toxicity Mitigation in Protein Language Models

Researchers developed Logit Diff Amplification (LDA) as an inference-time safety mechanism for protein language models to prevent toxic protein generation. The method reduces predicted toxicity rates while maintaining biological plausibility and structural viability, addressing dual-use safety concerns in AI-driven protein design.

AINeutralarXiv – CS AI · Mar 37/104

🧠

The Information-Theoretic Imperative: Compression and the Epistemic Foundations of Intelligence

Researchers propose the Compression Efficiency Principle (CEP) to explain why artificial neural networks and biological brains develop similar representations despite different substrates. The theory suggests both systems converge on efficient compression strategies that encode stable invariants rather than unstable correlations, providing a unified framework for understanding intelligence across biological and artificial systems.

AINeutralarXiv – CS AI · Jun 236/10

🧠

Brain-Inspired Stochastic Joint Embedding Representation Learning

Researchers introduce PhiNet v2, a brain-inspired machine learning architecture that learns visual representations from temporal image sequences without heavy data augmentation, achieving competitive performance with state-of-the-art models while mimicking biological visual processing more closely.

AINeutralarXiv – CS AI · Jun 96/10

🧠

Knowledge Graphs and Reasoning LLMs for Finding Simple Yet Effective Transcriptomic Perturbation Predictors

Researchers demonstrate that simple K-nearest neighbor models leveraging biological knowledge graphs achieve competitive performance in predicting gene knockout effects on transcriptomic expression, with reinforcement learning-optimized LLMs further improving results to match state-of-the-art methods. This work suggests knowledge graphs serve as effective model priors for complex biological prediction tasks.

AINeutralarXiv – CS AI · Jun 26/10

🧠

Planktonzilla: Multimodal dataset and models for understanding plankton ecosystems

Researchers introduce Planktonzilla-17M, the largest unified plankton image dataset with 17.4 million images across 602 taxonomic classes from thirteen imaging systems. The work demonstrates that supervised learning with taxonomic lineage outperforms CLIP-style training and reveals limitations in current biological foundation models like BioCLIP for marine imaging applications.

AINeutralarXiv – CS AI · May 286/10

🧠

Verifiable Benchmarking of Long-Horizon Spatial Biology

Researchers introduced SpatialBench-Long, a comprehensive benchmark testing AI agents' ability to conduct end-to-end scientific reasoning on complex spatial biology data without prescribed methods. The benchmark spans 24 evaluations across multiple cancer and aging systems using diverse measurement technologies, with current leading models achieving only 11.1% success rate, revealing significant limitations in AI's capacity for autonomous biological discovery.

🏢 OpenAI🧠 GPT-5🧠 Gemini

AIBullisharXiv – CS AI · Mar 176/10

🧠

ES-Merging: Biological MLLM Merging via Embedding Space Signals

Researchers propose ES-Merging, a new framework for combining specialized biological multimodal large language models (MLLMs) by using embedding space signals rather than traditional parameter-based methods. The approach estimates merging coefficients at both layer-wise and element-wise granularities, outperforming existing merging techniques and even task-specific fine-tuned models on cross-modal scientific problems.

AIBullisharXiv – CS AI · Feb 276/104

🧠

Multi-Dimensional Spectral Geometry of Biological Knowledge in Single-Cell Transformer Representations

Researchers decoded the internal representations of scGPT, a single-cell foundation model, revealing it organizes genes into interpretable biological coordinate systems rather than opaque features. The model encodes cellular organization patterns including protein localization, interaction networks, and regulatory relationships across its transformer layers.

AINeutralarXiv – CS AI · Mar 34/103

🧠

Synaptic bundle theory for spike-driven sensor-motor system: More than eight independent synaptic bundles collapse reward-STDP learning

Researchers developed a spike-driven sensor-motor system that identifies critical limits for neuronal learning. The study found that learning collapses when the number of motor neurons or independent synaptic bundles exceeds certain thresholds, providing insights into biological spike-based control mechanisms.