Stanford Medicine and researchers in Denmark and Boston have built an AI foundation model, SleepFM, that can read a single night of lab‑recorded sleep and forecast the risk of more than 100 diseases years before they appear in the clinic. This feels like the moment sleep medicine meets large‑scale, multimodal representation learning in a very real way.

What is SleepFM?

SleepFM is a multimodal “sleep foundation model” trained on over 585,000 hours of polysomnography (PSG) from about 65,000 individuals across multiple cohorts, including the Stanford Sleep Clinic, BioSerenity labs, and large epidemiological studies such as MESA and MrOS. Each overnight study involves a dense stream of brain, cardiac, and muscle respiratory data, all of which are sampled at 128 Hz and segmented into 5-second tokens, similar to the “words” of a physiology language model.

What sets this model apart is its channel-agnostic approach. This means SleepFM can adapt to different PSG setups from different study sites. This is done by first embedding each channel using 1D convolutions, with channel and time pooling done via transformer blocks and attention mechanisms. This unique design allows SleepFM to work across various clinical settings without constraints about which electrodes or belts are utilized.

Learning the “language” of sleep

In order to get past the constraints of traditional, label-hungry models, the researchers pre-train SleepFM using a method called leave-one-out contrastive learning, which is a form of self-supervision. For each 5-minute segment, one particular modality is concealed (for instance, leave out the ECG), and the model is tasked with aligning its representation with the summary derived from the remaining modalities (EEG/EOG, EMG, respiratory, etc.), which compels the network to understand and learn the commonalities across the different physiological modalities.

These five-second embeddings become versatile representations for downstream tasks after fine-tuning. Lightweight LSTM-based heads evaluate and predict the age and sex, collaborate on sleep stage classification, sleep apnea detection, and, most ambitiously, disease risk prediction in the survival style. A modest size with fine-tuning heads lets the model achieve competitive F1 scores of approximately 0.70-0.78 for sleep stage classification and strong accuracy for the severity of apnea, thus rivalling state-of-the-art specific task architectures like U-Sleep, YASA, GSSC, and STAGES.

Predicting disease years in advance

The boldest aspect of this research is the phenome-wide study of the relationship between sleep and disease. At Stanford, PSGs from approximately 35,000 patients (1999–2024) were linked to over 25 years of electronic health records, ICD-9/10 coding to 1,041 phecodes. With a multilabel Cox proportional hazards loss, SleepFM learned to predict time-to-event for each condition from a single baseline night of sleep.

Out of these 1,041 disease categories, 130 could be predicted with a concordance index and 6‑year AUROC of at least 0.75, after stringent Bonferroni correction. Performance was particularly striking for:

  • Neurological and cognitive outcomes such as Parkinson’s disease, dementia, mild cognitive impairment, and developmental disorders.
  • Circulatory phenotypes, including hypertensive heart disease, heart failure, stroke, and intracranial hemorrhage.
  • Neoplasms like prostate, breast, and skin cancers, as well as chronic kidney disease and all‑cause mortality.

For mortality, SleepFM achieved a C-index of approximately 0.84, clearly outperforming both a demographics-only model (age, sex, BMI, race/ethnicity) and an end-to-end PSG model trained from scratch with the same architecture and no pretraining. Comparable improvements were observed across categories, including neurology, endo-metabolic, respiratory, and hematopoiesis, indicating the foundation model embeddings likely discern subtle, cross-domain signatures that more straightforward models miss.

How does the model “see” disease in sleep?

There is some interpretability in the author’s mechanistic perspective by stratifying performance levels for different sleep stages and modalities. EEG/EOG brain activity signals are relevant for most mental and neurological disorders, respiratory and metabolic disorders are relevant for respiratory channels, and for circulatory disorders, ECG dominates, and the highest performance is always attained with a combination of all modalities.

Some stages, like N1/N2 and REM, contain somewhat more relevant information for some of the cardiovascular and neurodegenerative outcomes. This is consistent with previous studies that REM disruption, arousal burden, and sleep inefficiency are associated with higher mortality and dementia. The authors also show that a variant trained on one modality, e.g., ECG only, loses some degree of accuracy. This reinforces the idea that desynchrony between systems, like a “sleeping” cortex paired with an “awake” heart, is a relevant predictor of deteriorating health outcomes.

Why this matters

SleepFM shows how foundation models approach coarse beat and heterogeneous physiological data and turn it into reusable, task-agnostic embeddings. The model scales to show that even when fine-tuned with only 10% of labeled data on an external cohort (SHHS), it exceeds the supervised baselines trained on significantly more annotations and continues to perform well for stroke, heart failure, and cardiovascular death. A single overnight study could transform into a rich, multi-dimensional risk profile for neurodegeneration, cardiometabolic disease, cancer, and more, well before overt symptoms appear. As sleep technologies that track wearables develop, foundation model variants could extend outside the lab for low-burden real-world continuous monitoring, especially when combined with omics, imaging, and longitudinal EHR data.

Several of the cohorts are enriched for patients referred to specific sleep clinics; some temporal drift has been shown to slightly detract from performance on the most recent data, and individualized fine-grain interpretability still remains to be tackled.

Conclusion

SleepFM demonstrates that our nightly physiology captures far more about future disease than conventional measures, such as the apnea-hypopnea index or total sleep time, would indicate, and that foundation models are finally powerful enough to read such a hidden metric.

Article Source: Reference Paper | Reference Article | Code Availability: GitHub.

Disclaimer:
The research discussed in this article was conducted and published by the authors of the referenced paper. CBIRT has no involvement in the research itself. This article is intended solely to raise awareness about recent developments and does not claim authorship or endorsement of the research.

Learn More:

Author
Website |  + posts

Anchal is a consulting scientific writing intern at CBIRT with a passion for bioinformatics and its miracles. She is pursuing an MTech in Bioinformatics from Delhi Technological University, Delhi. Through engaging prose, she invites readers to explore the captivating world of bioinformatics, showcasing its groundbreaking contributions to understanding the mysteries of life. Besides science, she enjoys reading and painting.

LEAVE A REPLY

Please enter your comment!
Please enter your name here