Abstract: Efficient talking face video coding and control are crucial in modern video communication, reshaping how individuals connect, collaborate, and interact. Coding seeks to reduce transmission ...
Previous research has investigated the application of Multimodal Large Language Models (MLLMs) in understanding 3D scenes by interpreting them as videos. These approaches generally depend on ...