ProteinJEPA: Latent prediction complements protein language models
Researchers demonstrate that ProteinJEPA, a latent-space prediction technique, can complement traditional masked language modeling (MLM) in protein language models, achieving better downstream task performance when combined strategically. The optimal approach—masked-position MLM+JEPA—wins 10 out of 16 evaluation tasks against MLM-only baselines while maintaining computational efficiency.
ProteinJEPA addresses a fundamental question in protein AI research: whether latent-space prediction methods can enhance the effectiveness of protein language models beyond standard token-level objectives. Protein language models have become essential tools for understanding protein function and structure, yet their training regimens remain suboptimal. This work systematically evaluates whether combining masked language modeling with JEPA (Joint-Embedding Predictive Architecture) latent prediction yields meaningful improvements without additional computational overhead.
The research builds on recent advances in self-supervised learning for proteins, where models like ESM2 have demonstrated strong performance through MLM alone. The key innovation here is determining the right balance between token-level and latent-level objectives. Rather than replacing MLM entirely with JEPA, the researchers find that predicting latent representations only at masked positions—while retaining MLM's cross-entropy loss—provides the optimal trade-off. This hybrid approach succeeds across 35M to 150M parameter models and shows improvements on biologically meaningful tasks including protein stability prediction, enzyme classification, and fold retrieval.
For the protein research community and biotech developers, these findings have practical implications. Improved protein language models directly enhance applications in drug discovery, enzyme engineering, and protein design. The consistency of gains across multiple downstream tasks—particularly on fitness prediction and homology detection—suggests the method addresses genuine limitations in current models. However, mixed results when pretraining from scratch indicate the technique's effectiveness may depend on initialization conditions or dataset characteristics. The work positions JEPA as a complementary rather than replacement approach, suggesting practitioners should maintain MLM-based training while selectively incorporating latent prediction for specific use cases.
- →Masked-position MLM+JEPA outperforms pure MLM on 10 of 16 downstream protein tasks under equivalent computational budgets
- →The optimal recipe combines token-level and latent-level objectives rather than replacing one with the other
- →Improvements emerge across critical biological tasks including protein stability, enzyme fitness, and fold classification
- →All-position JEPA variants and JEPA-only approaches underperform, indicating careful architectural choices matter significantly
- →Results generalize across model scales (35M-150M parameters) and pretrained ESM2 variants