BCoughBench: Benchmarking Respiratory Acoustic Foundation Models Under Body-Coupled Wearable Sensor Conditions
BCoughBench introduces a standardized evaluation framework for respiratory acoustic foundation models deployed on body-coupled wearable sensors, revealing significant performance degradation compared to smartphone recordings. The study demonstrates that existing models fail to meet clinical thresholds for disease detection when adapted to wearable conditions, though demographic tasks like age regression remain robust.
The deployment of machine learning models from laboratory conditions to real-world clinical devices presents a critical validation gap. BCoughBench addresses this gap by stress-testing five foundation models across multiple sensor configurations that simulate how wearable devices attenuate acoustic signals through tissue and bone. This research matters because healthcare systems increasingly adopt body-coupled sensors for continuous monitoring, yet the models powering diagnostic decisions have been validated almost exclusively on smartphone data—a fundamentally different acoustic environment.
The findings expose a substantial performance cliff. Mean AUROC scores drop from 0.785 on smartphones to 0.689-0.723 under wearable conditions, with disease classification tasks particularly vulnerable. Temple vibration pickup causes the largest degradation (-0.096 AUROC), while in-ear sensors show better resilience. Critically, no evaluated model achieves the clinical sensitivity threshold (≥0.20 at 95% specificity) for most disease tasks under any wearable configuration, raising deployment concerns for diagnostic applications.
The heterogeneous impact across tasks suggests that signal attenuation affects disease signatures disproportionately compared to demographic markers. COVID detection proves surprisingly robust (minimal degradation), while sex classification collapses dramatically on the CIDRZ cohort. This variability indicates that foundation models may encode disease features in high-frequency content more heavily than demographic signals, though acoustic environment effects are disease-specific rather than universal.
The framework itself enables reproducible evaluation and facilitates model development targeting wearable deployment. Organizations developing respiratory monitoring systems can use BCoughBench to validate models before clinical trials, reducing costly downstream failures. Future research should focus on domain adaptation techniques and architectural modifications optimized for body-coupled sensors rather than assuming smartphone-trained models transfer effectively.
- →Foundation models show 9-10% absolute AUROC decline when transitioning from smartphone to wearable sensor conditions
- →No evaluated model meets clinical sensitivity thresholds for disease detection on most tasks under body-coupled sensors
- →Performance degradation is task-dependent: demographic tasks remain robust while disease classification suffers most severely
- →BCoughBench provides reproducible evaluation framework enabling developers to validate models for wearable deployment before clinical use
- →Temple vibration pickup causes largest performance impact while soft in-ear configurations show best resilience