SFL-MTSC: Leveraging Semantic Frame-Level Multi-Task Self-Consistency for Robust Multi-Intent Spoken Language Understanding
Researchers propose SFL-MTSC, a framework that improves spoken language understanding in large language models by addressing inconsistent intent-slot structures in multi-intent scenarios. Using semantic frame-level aggregation instead of simple majority voting, the method shows improved slot F1 and accuracy on the MAC-SLU benchmark while maintaining stable intent recognition.
This research addresses a fundamental challenge in deploying large language models for spoken language understanding: the inherent stochasticity in LLM outputs creates inconsistent structural predictions, particularly problematic when understanding utterances containing multiple intents. The SFL-MTSC framework represents an advancement in post-processing methodology rather than model architecture changes, making it applicable to existing LLM deployments without retraining.
The technical contribution centers on decomposing predictions into semantic frames at a granular level, then applying domain-intent grouping and slot-level clustering to evaluate which predicted structures are reliable. This hierarchical approach is more sophisticated than traditional ensemble methods that simply aggregate outputs through majority voting, allowing the framework to leverage the strengths of multiple inference paths while filtering out inconsistent or low-confidence predictions.
For the broader AI community, this work demonstrates practical approaches to improving LLM reliability without requiring larger models or substantial computational overhead. The zero-shot performance on MAC-SLU indicates the method generalizes without task-specific fine-tuning, suggesting applicability across diverse SLU scenarios. The stable intent accuracy across settings is particularly important, as it shows the method doesn't sacrifice primary task performance while improving secondary metrics.
Future development should focus on whether this semantic frame-level aggregation approach scales to more complex multi-turn dialogues and whether similar principles apply to other structured prediction tasks in NLP. The research opens pathways for cost-effective reliability improvements in production LLM systems, particularly valuable for enterprise applications requiring consistent structured outputs.
- βSFL-MTSC improves slot F1 and accuracy over single-path LLM inference by aggregating predictions at the semantic frame level rather than output level.
- βThe framework applies domain-intent grouping and cluster reliability scoring to filter unreliable predictions before final output generation.
- βZero-shot evaluation on MAC-SLU demonstrates the method's generalization capability without task-specific fine-tuning or retraining.
- βIntent accuracy remains stable while slot-level metrics improve, indicating the approach optimizes secondary task performance without degrading primary objectives.
- βThe technique is applicable to existing deployed LLMs without architectural changes or retraining, enabling cost-effective reliability enhancement.