AIBullisharXiv – CS AI · Mar 37/104
🧠GeneZip is a new DNA compression model that achieves 137.6x compression with minimal performance loss by recognizing that genomic information is highly imbalanced. The system enables training of much larger AI models for genomic analysis using single GPU setups instead of expensive multi-GPU configurations.
AIBullisharXiv – CS AI · 4d ago6/10
🧠Frontier large language models from Anthropic and OpenAI have demonstrated competitive performance with human experts at annotating natural phenotypes to ontology terms, a previously labor-intensive bottleneck in biological research. When evaluated against the same Gold Standard benchmark used in a 2018 study, these AI agents performed within the range of trained human curators and substantially outperformed prior NLP tools, suggesting significant potential to scale phenotype annotation workflows.
🏢 OpenAI🏢 Anthropic
AINeutralarXiv – CS AI · 4d ago5/10
🧠TaxDistill introduces a knowledge distillation framework using GenomeOcean, a 500M-parameter genomic foundation model, to improve metagenomic taxonomic annotation by reducing label noise from sequence similarity tools. The approach achieves significant performance gains, improving F1 scores by 23.3% on gastrointestinal datasets compared to traditional methods.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers propose L3-PPI, a biologically-informed machine learning approach for predicting protein-protein interactions by leveraging the L3 rule—the principle that multiple length-3 paths between proteins indicate interaction likelihood. The method integrates a lightweight graph prompt learning module into existing PPI predictors as a plug-and-play component, demonstrating superior performance over conventional approaches that rely on generic aggregation methods.
AINeutralarXiv – CS AI · May 126/10
🧠Researchers present a novel machine learning framework that combines DNA sequence analysis with graph neural networks to predict biological age from methylation patterns, achieving 12.8% improvement over existing methods. The approach uses handcrafted sequence features rather than deep learning to encode biological context, demonstrating practical advantages in aging research applications.
AINeutralarXiv – CS AI · May 116/10
🧠Researchers introduce OmicsLM, a multimodal large language model that interprets transcriptomic data by combining quantitative gene expression profiles with natural language processing. Trained on 5.5 million examples across 70 task types, the model outperforms specialized omics tools and general LLMs on language-guided biological reasoning tasks, advancing AI applications in genomic research.
AINeutralarXiv – CS AI · Mar 37/106
🧠Researchers introduce ProtRLSearch, a multi-round protein search agent that uses reinforcement learning and multimodal inputs (protein sequences and text) to improve protein analysis for healthcare applications. The system addresses limitations of single-round, text-only protein search agents and includes a new benchmark called ProtMCQs with 3,000 multiple choice questions for evaluation.
AINeutralarXiv – CS AI · Mar 35/103
🧠Researchers introduce Protap, a comprehensive benchmark comparing protein modeling approaches across realistic applications. The study finds that large-scale pretrained models often underperform supervised encoders on small datasets, while structural information and domain-specific biological knowledge can enhance specialized protein tasks.
AINeutralarXiv – CS AI · Mar 175/10
🧠Researchers developed a comprehensive benchmarking system to evaluate AI agent performance in single-cell omics analysis, testing 50 real-world tasks across multiple frameworks. The study found that Grok3-beta achieved state-of-the-art performance, while multi-agent frameworks significantly outperformed single-agent approaches through specialized role division.
🧠 Grok
AINeutralarXiv – CS AI · Feb 274/106
🧠Researchers developed MEDNA-DFM, a dual-view deep learning model that predicts DNA methylation patterns while providing biological explanations. The model achieves high accuracy across species and includes explainable AI features that reveal conserved genetic motifs and cooperative sequence-structure relationships.
AINeutralarXiv – CS AI · Feb 274/107
🧠Researchers developed UTR-STCNet, a new Transformer-based AI model that can analyze variable-length genetic sequences to predict protein translation efficiency. The model outperformed existing methods and can identify important regulatory elements in mRNA sequences, potentially advancing therapeutic mRNA design.