AINeutralarXiv โ CS AI ยท Mar 44/102
๐ง
Real-Time Generation of Game Video Commentary with Multimodal LLMs: Pause-Aware Decoding Approaches
Researchers developed new prompting-based approaches using multimodal large language models to generate real-time video commentary that considers both content relevance and timing. The study introduces dynamic interval-based decoding that adjusts prediction timing based on utterance duration, showing improved alignment with human commentary patterns without requiring model fine-tuning.