🧠 AI🟢 BullishImportance 7/10

Next-Token Prediction Learns Generalisable Representations of Sleep Physiology

arXiv – CS AI|Jonathan F. Carter, Lionel Tarassenko|June 9, 2026 at 04:00 AM

🤖AI Summary

Researchers introduce Hypnos, a multi-modal foundation model trained on next-token prediction that learns generalizable representations of sleep physiology from over 20,000 polysomnography recordings across eight sensing modalities. The model achieves performance parity with supervised baselines on sleep stage classification while using 100× less labeled data and demonstrates cross-domain generalization by outperforming specialized models on daytime cardiac tasks.

Analysis

Hypnos represents a meaningful advancement in self-supervised learning for physiological signal processing, addressing fundamental limitations in existing approaches. While masked-reconstruction and contrastive objectives have dominated foundation model development, these techniques carry inherent drawbacks: masked reconstruction struggles with the stochastic nature of biological signals, while contrastive learning requires well-defined positive pairs despite poorly understood semantic invariances in physiological data. The next-token prediction objective sidesteps these challenges through an autoregressive approach that mirrors successful scaling strategies in language models.

The technical implementation demonstrates careful engineering—tokenizing eight modalities via residual vector quantization and training an RQ-Transformer to predict across all streams simultaneously. This architecture enables flexible inference on variable sensor subsets, addressing practical deployment constraints in heterogeneous clinical environments. The scale of training data (20,000+ overnight recordings) provides substantial statistical grounding for representation learning.

The experimental validation reveals compelling practical implications. Achieving supervised-equivalent performance in sleep stage classification with 100× less labeled data has direct healthcare value, reducing annotation burden in a field where expert labeling remains expensive and time-consuming. Generalization to atrial fibrillation detection—an out-of-distribution task both temporally and mechanistically—suggests learned representations capture fundamental physiological principles rather than task-specific patterns.

For the broader AI infrastructure sector, this validates next-token prediction as a general principle for multimodal representation learning beyond language. The healthcare application domain benefits significantly, as efficient pre-training reduces development friction for downstream clinical applications. Future work likely extends this approach to other multimodal physiological domains and investigates whether learned representations transfer to diagnostic tasks beyond those demonstrated.

Key Takeaways

→Next-token prediction outperforms masked reconstruction and contrastive learning for self-supervised physiological signal modeling
→Hypnos achieves supervised baseline performance on sleep classification with 100× less labeled training data
→The foundation model generalizes across modalities and temporal domains, exceeding specialized models on out-of-distribution cardiac detection
→Multi-modal tokenization via residual vector quantization enables flexible inference with variable sensor subsets
→Scalable representation learning from 20,000+ polysomnography recordings demonstrates viability of foundation models for healthcare applications