#speech-understanding News & Analysis

2 articles tagged with #speech-understanding. AI-curated summaries with sentiment analysis and key takeaways from 50+ sources.

2 articles

AINeutralarXiv – CS AI · Jun 56/10

🧠

ProSarc: Prosody-Aware Sarcasm Recognition Framework via Temporal Prosodic Incongruity

Researchers introduce ProSarc, an audio-only machine learning framework that detects sarcasm by analyzing temporal mismatches between local prosodic patterns and overall emotional tone. The model achieves strong performance on multiple datasets (F1=75.3 on MUStARD++) and demonstrates cross-lingual generalization, advancing computational understanding of spoken sarcasm detection.

AINeutralarXiv – CS AI · Apr 136/10

🧠

See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models

Researchers introduce AV-SpeakerBench, a new 3,212-question benchmark designed to evaluate how well multimodal large language models understand audiovisual speech by correlating speakers with their dialogue and timing. Testing reveals Gemini 2.5 Pro significantly outperforms open-source competitors, with the gap primarily attributable to inferior audiovisual fusion capabilities rather than visual perception limitations.

🧠 Gemini