AINeutralarXiv โ CS AI ยท 6d ago4/103
๐ง
How Do Optical Flow and Textual Prompts Collaborate to Assist in Audio-Visual Semantic Segmentation?
Researchers introduce Stepping Stone Plus (SSP), a novel framework that combines optical flow and textual prompts to improve audio-visual semantic segmentation. The method outperforms existing approaches by using motion dynamics for moving sound sources and textual descriptions for stationary objects, with a visual-textual alignment module for better cross-modal integration.