AINeutralarXiv – CS AI · 14h ago6/10
🧠
Survey of End-to-End Multi-Speaker Automatic Speech Recognition for Monaural Audio
This survey comprehensively reviews end-to-end neural architectures for multi-speaker automatic speech recognition on monaural audio, analyzing SIMO vs. SISO paradigms, recent algorithmic improvements, and extensions to long-form speech. The work addresses a critical gap in literature by systematizing recent advances in a field transitioning from cascade to unified E2E systems that better handle overlapping speech and speaker attribution.