AIBullisharXiv – CS AI · 7h ago6/10
🧠
MURMUR: An Efficient Inference System for Long-Form ASR
Researchers introduce Murmur, an inference system that optimizes long-form automatic speech recognition by balancing accuracy and latency through a two-level approach: intermediate chunk sizes at the inter-chunk level and attention sparsity exploitation at the intra-chunk level. The system achieves 4.2x latency reduction while maintaining single-pass accuracy on benchmark tests.