AINeutralarXiv – CS AI · 9h ago6/10
🧠
A Study of the Scale Invariant Signal to Distortion Ratio in Speech Separation with Noisy References
This research examines how the Scale-Invariant Signal-to-Distortion Ratio (SI-SDR) metric used to train and evaluate speech separation models performs poorly when training data contains noise, revealing fundamental limitations in the current benchmark approach. The authors propose reference enhancement techniques to mitigate this issue, though results indicate that processing introduces artifacts that limit overall quality improvements.