y0news
← Feed
Back to feed
🧠 AI NeutralImportance 6/10

Protein contacts are already in the attention: a single-forward-pass alternative to the Categorical Jacobian

arXiv – CS AI|Rome Thorstenson|
🤖AI Summary

Researchers demonstrate that protein contact prediction can be extracted from language model attention heads in a single forward pass, outperforming the computationally expensive Categorical Jacobian method on clean test data. The findings reveal that contact information is concentrated in a small subset of attention heads, requiring only 10 labeled proteins for head selection.

Analysis

This research addresses computational efficiency in protein structure prediction by challenging the Categorical Jacobian approach, which requires approximately 19L forward passes to predict protein contacts. The authors present a dramatically more efficient alternative by identifying that bidirectional language models already encode contact information across a small number of attention heads, enabling single-pass inference.

The work builds on recent advances in using protein language models for structural biology tasks. Previous methods like the Categorical Jacobian represented state-of-the-art approaches for extracting contact predictions, but their computational cost limited practical deployment at scale. The discovery that this signal concentrates in specific attention heads connects to broader findings about language model interpretability and efficient feature extraction.

For computational biologists and researchers developing protein prediction tools, this represents a meaningful efficiency gain—enabling inference at 19x faster speed while maintaining or improving accuracy on leakage-clean benchmarks. The 9 percentage point improvement on the CAMEO split (a rigorous test avoiding pretraining contamination) suggests the method captures genuine structural signal rather than memorized patterns. However, the substantial performance drop between in-distribution and leakage-clean evaluation (30-36 percentage points) indicates that existing benchmarks may overestimate real-world performance due to pretraining overlap.

The finding that causal language models like ProGen2 lose contact signal entirely suggests bidirectional pretraining may be fundamental to this approach, limiting generalization. Researchers developing protein language models or deploying contact prediction systems should consider these efficiency gains, though the reliance on bidirectional architectures constrains applicability across different model families.

Key Takeaways
  • Protein contacts can be predicted from language model attention heads in one forward pass versus 19 passes for prior methods
  • Contact-relevant information concentrates in a small subset of attention heads, selecting optimal heads requires only 10 labeled proteins
  • Leakage-clean evaluation reveals substantial pretraining contamination in prior benchmarks, with performance gaps of 30-36 percentage points
  • The method fails on causal language models, suggesting bidirectional pretraining may be essential for encoding pair structure in attention
  • Head selection rather than averaging drives performance gains, indicating supervised attention head discovery is the critical innovation
Read Original →via arXiv – CS AI
Act on this with AI
Stay ahead of the market.
Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.
Connect Wallet to AI →How it works
Related Articles