Abstract: Visual Dialog is a challenging multimodal task requiring models to answer questions about images through multi-turn conversations. Despite significant progress, research has predominantly ...