You Only Train Once: Differentiable Subset Selection for Omics Data
Researchers introduce YOTO, an end-to-end machine learning framework that simultaneously selects compact gene subsets and performs prediction tasks in single-cell transcriptomic analysis. The differentiable architecture enforces sparsity and uses multi-task learning to improve biomarker discovery while outperforming existing feature selection methods.
YOTO addresses a fundamental bottleneck in genomic research where feature selection and predictive modeling have traditionally operated as disconnected stages. Existing approaches rely on post hoc attribution methods that decouple selection from prediction, creating inefficiencies in identifying meaningful biomarkers from high-dimensional single-cell RNA-seq data. This framework innovates by creating a closed feedback loop where prediction directly guides gene selection while selected genes shape the learned representation.
The development reflects broader trends in machine learning toward end-to-end differentiable architectures and joint optimization. In genomics specifically, researchers increasingly recognize that biomarker discovery requires coupling feature importance with downstream task performance. YOTO's multi-task learning design enables knowledge sharing across related prediction objectives, allowing partially labeled datasets to inform one another—a practical advantage when comprehensive annotations are costly or unavailable.
For the biomarker discovery and precision medicine fields, this approach reduces computational overhead and training complexity. Compact gene subsets directly translate to cost-effective profiling workflows, reducing sequencing expenses for clinical applications. The framework's ability to discover generalizable gene panels across multiple tasks without retraining downstream classifiers accelerates translation from research to clinical implementation.
Key considerations for adoption include validation on larger, more diverse cohorts beyond the two representative datasets tested. The framework's performance on rare diseases or complex tissues with subtle phenotypic signatures remains to be demonstrated. Success here could catalyze adoption in genomics workflows, particularly for institutions seeking to optimize both accuracy and operational costs in single-cell analysis pipelines.
- →YOTO jointly optimizes gene subset selection and prediction in a single differentiable architecture, eliminating weak coupling between feature identification and downstream modeling.
- →Multi-task learning enables knowledge sharing across related objectives, allowing partially labeled datasets to improve gene subset discovery without additional training.
- →Enforced sparsity ensures only selected genes contribute to inference, reducing computational requirements and enabling cost-effective genomic profiling.
- →Framework outperforms state-of-the-art baselines on single-cell RNA-seq datasets, demonstrating improvements in both predictive performance and biomarker interpretability.
- →Compact, task-generalizable gene panels reduce clinical implementation costs and accelerate translation from research discovery to precision medicine applications.