AIBullisharXiv โ CS AI ยท 6h ago1
๐ง
TripleSumm: Adaptive Triple-Modality Fusion for Video Summarization
Researchers introduce TripleSumm, a novel AI architecture that adaptively fuses visual, text, and audio modalities for improved video summarization. The team also releases MoSu, the first large-scale benchmark dataset providing all three modalities for multimodal video summarization research.