Abstract: How to effectively interact audio with vision has garnered considerable interest within the multi-modality research field. Recently, a novel audio-visual video segmentation (AVS) task has ...
Video Frame Interpolation (VFI) is a fundamental Low-Level Vision (LLV) task that synthesizes intermediate frames between existing ones while maintaining spatial and temporal coherence. VFI techniques ...
The official implementation of NarVid — a framework that enhances text-video retrieval by leveraging frame-level captions (narration) to improve semantic understanding and retrieval accuracy. NarVid ...
Abstract: Semantic communication has emerged as one of the most promising paradigms for the upcoming 6G communication. In 6G communication, ultra-low latency video communication is necessary to meet ...