🧠 AI🟢 BullishImportance 7/10

CORTIS: Text-Only Adaptation of Spoken Language Models for Task-Oriented Voice Agents

arXiv – CS AI|Youngwon Choi, Hyeonyu Kim, Taeyoun Kwon, Donghyuk Jung, Myeongkyun Cho|June 23, 2026 at 04:00 AM

🤖AI Summary

Researchers introduce CORTIS, a framework that enables spoken language models (SLMs) to handle task-oriented voice agent functions using only text-based training data, eliminating the need for expensive paired speech-target annotations. The approach matches or outperforms traditional ASR-LLM cascades while demonstrating superior robustness under acoustic degradation.

Analysis

CORTIS addresses a critical bottleneck in deploying task-oriented voice agents: the expense and complexity of collecting paired speech-target training data. By enabling SLMs to learn task semantics from text-only supervision and apply them to speech at inference time, the framework reduces development friction for voice AI applications. This approach represents a pragmatic middle ground between cascaded ASR-LLM systems, which suffer from transcription error propagation in noisy environments, and fully speech-supervised models, which demand costly annotation efforts.

The technical achievement stems from a growing recognition that modern language models can transfer knowledge across modalities more effectively than previously assumed. Rather than requiring explicit speech-task pairs, CORTIS leverages the alignment already learned by SLMs during pre-training, fine-tuning them on structured task outputs presented in text form. This methodology aligns with broader trends in multimodal AI where single models handle multiple input types through unified representations.

For the voice AI industry, CORTIS potentially lowers barriers to entry for smaller teams and organizations building task-oriented agents. Companies developing voice-controlled systems for customer service, smart home environments, or enterprise applications could significantly reduce training data collection costs. The demonstrated resilience under acoustic degradation is particularly valuable for real-world deployments where background noise, accents, and speech variations remain persistent challenges.

The competitive parity with ASR-LLM baselines suggests SLMs may represent the optimal architecture for task-oriented voice work going forward. Future development will likely focus on scaling this text-only adaptation approach across more specialized domains and refining performance on edge cases where semantic preservation remains critical.

Key Takeaways

→CORTIS enables task-oriented voice agents to be trained with text-only supervision, eliminating the need for expensive paired speech-target annotations.
→The framework matches ASR-LLM cascade performance while offering superior robustness under noisy acoustic conditions.
→Text-only fine-tuning of SLMs demonstrates effective knowledge transfer for task semantics without explicit speech-task pairs.
→The approach significantly reduces development costs for voice AI applications by removing the bottleneck of speech data collection.
→Results suggest spoken language models may be preferable to cascaded ASR-LLM systems for task-oriented voice agent deployment.

#voice-ai #spoken-language-models #task-oriented-agents #nlp #multi-modal-ai #cortis #slm #text-adaptation

Read Original →via arXiv – CS AI

Act on this with AI

Stay ahead of the market.

Connect your wallet to an AI agent. It reads balances, proposes swaps and bridges across 15 chains — you keep full control of your keys.

Connect Wallet to AI →How it works

AIMay 6

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

AIMay 6

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

AIMay 6

CORTIS: Text-Only Adaptation of Spoken Language Models for Task-Oriented Voice Agents

Your company’s AI could delete everything in 9 seconds. ServiceNow wants to be the kill switch

Hut 8 (HUT) Stock Soars 37% on Massive $9.8 Billion AI Data Center Agreement

S&P 500 and NASDAQ hit record highs as AI chip stocks surge