Abstract: Large vision-language models (LVLMs) are prone to producing factual hallucinations, such as non-existent entities or false attributes, during image captioning and visual question answering, ...