Abstract: How to effectively interact audio with vision has garnered considerable interest within the multi-modality research field. Recently, a novel audio-visual video segmentation (AVS) task has ...