Scaling Conversational Hungarian ASR: The BEA-Dialogue+ Corpus
Researchers introduce BEA-Dialogue+, an expanded Hungarian conversational speech recognition corpus that nearly triples training data from 85 to 200 hours while maintaining speaker separation across dataset splits. The expanded resource enables better evaluation of automatic speech recognition models and demonstrates that specialized fine-tuning techniques improve performance on dialogue transcription tasks.
The development of BEA-Dialogue+ addresses a critical bottleneck in non-English speech recognition research. Hungarian ASR systems have historically struggled with limited publicly available dialogue data suitable for training, constraining model development and real-world deployment. By relaxing certain data split constraints while preserving speaker separation integrity, researchers have created a more practical benchmark that reflects actual usage scenarios where dialogue transcription systems encounter varied speakers and conversational patterns.
This work builds on broader trends in multilingual AI development, where underrepresented languages receive growing research attention as tech companies expand global reach. The corpus design choices—particularly the controlled trade-off study between data volume and speaker overlap—provide valuable insights applicable beyond Hungarian. The evaluation of both Whisper and FastConformer architectures with Serialized Output Training demonstrates that architectural choices and fine-tuning strategies significantly impact dialogue transcription quality, with SOT-based adaptation consistently outperforming baseline approaches across multiple error metrics.
For the speech recognition industry, BEA-Dialogue+ establishes a more challenging and realistic benchmark that will drive algorithm improvements for conversational transcription. The resource has immediate utility for researchers developing Hungarian language models and broader implications for low-resource language ASR development. The demonstrated effectiveness of SOT-based fine-tuning suggests this approach warrants investigation in other language pairs. Organizations building multilingual speech systems can leverage these findings to optimize training strategies, while the publicly available corpus accelerates Hungarian NLP research and reduces barriers to entry for smaller research teams.
- →BEA-Dialogue+ expands Hungarian dialogue ASR training data from 85 to 200 hours while maintaining speaker separation across splits
- →Serialized Output Training-based fine-tuning consistently improved performance across word and character error metrics
- →The expanded corpus presents greater challenges for baseline models, requiring more sophisticated adaptation strategies
- →The dataset design methodology provides a replicable framework for addressing data constraints in low-resource language ASR
- →Results validate Whisper and FastConformer architectures for Hungarian conversational speech tasks with proper fine-tuning