AINeutralarXiv – CS AI · 7h ago5/10
🧠
Scaling Conversational Hungarian ASR: The BEA-Dialogue+ Corpus
Researchers introduce BEA-Dialogue+, an expanded Hungarian conversational speech recognition corpus that nearly triples training data from 85 to 200 hours while maintaining speaker separation across dataset splits. The expanded resource enables better evaluation of automatic speech recognition models and demonstrates that specialized fine-tuning techniques improve performance on dialogue transcription tasks.