Dataset Visual Basic ES

Nguyen, N.H., Vo, D.T.D., Van Nguyen, K. and Nguyen, N.L. (2023) OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual Question Answering in Vietnamese. Information ...

ABSTRACT: This study proposes a multimodal AI model for classifying Vietnamese digital learning materials by integrating three key information sources: text content, image and graphic features, and ...

IEEE

AVCaps: An Audio-Visual Dataset With Modality-Specific Captions

Abstract: This paper introduces AVCaps, an audio-visual dataset that contains separate textual captions for the audio, visual, and audio-visual contents of video clips. The dataset contains 2061 video ...

EurekAlert!

New framework enhances remote sensing image fusion with frequency-independent feature learning

A research team led by Prof. XIE Chengjun and ZHANG Jie from the Hefei Institutes of Physical Science of the Chinese Academy of Sciences, developed a frequency domain-independent feature learning ...

marktechpost

Google DeepMind Introduces DeepMind Control Vision Benchmark (DMC-VB): A Dataset and Benchmark to Evaluate the Robustness of Offline Reinforcement Learning Agents to Visual ...

Reinforcement learning (RL) provides a framework for learning behaviors for control and making decisions (known as policies) that help the model earn the most rewards in a given environment. Online RL ...

eLife

Show inaccessible results

Nguyen, N.H., Vo, D.T.D., Van Nguyen, K. and Nguyen, N.L. (2023) OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual Question Answering in Vietnamese. Information ...

AVCaps: An Audio-Visual Dataset With Modality-Specific Captions

New framework enhances remote sensing image fusion with frequency-independent feature learning

Google DeepMind Introduces DeepMind Control Vision Benchmark (DMC-VB): A Dataset and Benchmark to Evaluate the Robustness of Offline Reinforcement Learning Agents to Visual ...

Dynamic organization of visual cortical networks inferred from massive spiking datasets

Getty Images drops ‘cleanest’ visual dataset for training foundation models

Extremely basic Dataset Tagging Tool

Visual Localization using other datasets for evaluation