Vision-Language Models for Vision Tasks: A Survey Vision-Language Models Tutorial

Mapping the susceptibility of large language models to medical misinformation across clinical notes and social media: a cross-sectional benchmarking analysis

aThe Windreich Department of Artificial Intelligence and Human Health, Mount Sinai Health System, New York, NY, USA bThe Hasso Plattner Institute for Digital Health at Mount Sinai, Mount Sinai Health ...

IEEE

Large Vision-Language Models are Generalist Solvers For Pathology Tasks

Abstract: Leveraging the powerful capabilities of large language models (LLMs), large vision-language models (LVLMs) can perform a wide variety of tasks based on input images and user instructions.

Hacker

PaddleOCR-VL-1.5: A 0.9B Vision-Language OCR Model Built for Real-World Documents

Among other things, launching AIModels.fyi ... Find the right AI model for your project - https://aimodels.fyi ...

GitHub

Ming-UniVision: Joint Image Understanding and Geneation with a Continuous Unified Tokenizer

🌐 Ming-UniVision is a groundbreaking multimodal large language model (MLLM) that unifies vision understanding, generation, and editing within a single autoregressive next-token prediction (NTP) ...

IEEE

Evaluating and Mitigating Relationship Hallucinations in Large Vision-Language Models

Abstract: The issue of hallucinations is a prevalent concern in existing Large Vision-Language Models (LVLMs). Previous efforts have primarily focused on investigating object hallucinations, which can ...

GitHub

[ICLR 2026] InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models

We present the InternSVG family, an integrated data–benchmark–model suite. The InternSVG-8B model is available at Hugging Face. It is based on the InternVL3-8B model, incorporating SVG-specific tokens ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results