#speech-llm News & Analysis

7 articles tagged with #speech-llm. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

7 articles

AIBullisharXiv – CS AI · 5d ago7/10

🧠

Liberating LLM Capabilities in Full-Duplex Speech Models

Researchers introduce Listen-Write-Speak (LWS), a new paradigm for speech-based large language models that enables simultaneous text output alongside spoken responses. The approach leverages a single autoregressive LLM with a Token Schema to unlock text-native capabilities like code generation and structured analysis in real-time conversational AI without architectural modifications.

AINeutralarXiv – CS AI · Jun 27/10

🧠

PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects

Researchers introduce PolySpeech-100, a comprehensive benchmark evaluating speech understanding across 110 languages and dialects, revealing that end-to-end speech-LLMs outperform traditional ASR+LLM systems on dialects but struggle with low-resource languages. The study of 22 state-of-the-art models exposes significant performance gaps and shows that chain-of-thought prompting often degrades speech comprehension, highlighting critical modality alignment issues in current AI architectures.

🧠 Gemini

AIBullisharXiv – CS AI · 5d ago6/10

🧠

LEAF: Growing Trees Without Branching for Speech-Aware Large Language Model Post-Training

LEAF (Low-rank Exploration with Adaptive Forking) introduces a novel tree-based reinforcement learning method for training speech-aware large language models that improves credit assignment by identifying shared response prefixes and assigning rewards at the span level rather than uniformly across tokens. The approach achieves superior performance compared to existing GRPO-style methods without requiring additional computational overhead, enabling smaller models to match or exceed larger baselines.

AINeutralarXiv – CS AI · Jun 16/10

🧠

A Unified and Reproducible Experimentation Framework for Speech Understanding

Researchers introduce SURE, a unified experimentation framework that standardizes evaluation metrics and training pipelines for speech understanding models, addressing reproducibility challenges that have hindered fair comparison of speech foundation models and Speech LLMs across different deployment scenarios.

AIBullisharXiv – CS AI · Mar 276/10

🧠

X-OPD: Cross-Modal On-Policy Distillation for Capability Alignment in Speech LLMs

Researchers propose X-OPD, a Cross-Modal On-Policy Distillation framework to improve Speech Large Language Models by aligning them with text-based counterparts. The method uses token-level feedback from teacher models to bridge performance gaps in end-to-end speech systems while preserving inherent capabilities.

AIBearisharXiv – CS AI · Mar 96/10

🧠

The Cascade Equivalence Hypothesis: When Do Speech LLMs Behave Like ASR$\rightarrow$LLM Pipelines?

Research reveals that speech LLMs don't perform significantly better than traditional ASR→LLM pipelines in most deployed scenarios. The study shows speech LLMs essentially function as expensive cascades that perform worse under noisy conditions, with advantages reversing by up to 7.6% at 0dB noise levels.

$LLM

AINeutralarXiv – CS AI · Mar 114/10

🧠

VoxEmo: Benchmarking Speech Emotion Recognition with Speech LLMs

Researchers introduce VoxEmo, a comprehensive benchmark for evaluating Speech Large Language Models on emotion recognition tasks across 35 emotion corpora and 15 languages. The benchmark addresses evaluation challenges in open text generation and introduces novel protocols that better align with human subjective emotion perception.