Vision-Language Pre Training Methods

Vision-language-action models are the next leap in autonomous robotics

Explore how vision-language-action models like Helix, GR00T N1, and RT-1 are enabling robots to understand instructions and act autonomously.

Gizmochina

Screen Addiction Is Hurting Your Eyes ?: AI-Powered Vision Training Glasses Can Help You

Smartphone use now exceeds three hours a day on average, while total daily screen time for many adults crosses six hours. This constant close-up focus has made eye fatigue, dryness, blurred vision, ...

The Robot Report

Microsoft Research reveals Rho-alpha vision-language-action model for robots

To be useful in more dynamic and less structured environments, robots need artificial intelligence trained on a variety of sensory inputs. Microsoft Corp. today announced Rho-alpha, or ρα, the first ...

VentureBeat

New ‘Test-Time Training’ method lets AI keep learning without exploding inference costs

A new study from researchers at Stanford University and Nvidia proposes a way for AI models to keep learning after deployment — without increasing inference costs. For enterprise agents that have to ...

Bloomberg L.P.

DeepSeek Touts New Training Method as China Pushes AI Efficiency

DeepSeek published a paper outlining a more efficient approach to developing AI, illustrating the Chinese artificial intelligence industry’s effort to compete with the likes of OpenAI despite a lack ...

blockchain

Next-Token Prediction in Vision AI: New Training Method Drives 83.8% ImageNet Accuracy and Strong Transfer Learning

According to @SciTechera, a new AI training approach applies next-token prediction—commonly used in language models—to Vision AI by treating visual embeddings as sequential tokens. This method for ...

EurekAlert!

Breakthroughs in optical image processing powered by vision-language models

The field of optical image processing is undergoing a transformation driven by the rapid development of vision-language models (VLMs). A new review article published in iOptics details how these ...

EurekAlert!

Multimodal pre-training is driving the technological revolution in the field of drug discovery

With the great success of large language models, self-supervised pre-training technologies have shown the great promise in the field of drug discovery. In particular, multimodal pre-training models ...

STAT

Off an $80 million acquisition, Cognita’s CEO on the power of scale in radiology AI

Katie Palmer covers telehealth, clinical artificial intelligence, and the health data economy — with an emphasis on the impacts of digital health care for patients, providers, and businesses. You can ...

blockchain

Gemini 3's Breakthrough: Enhanced AI Pre-Training and Post-Training Drive Major Performance Leap

According to @OriolVinyalsML, the key to Gemini 3’s remarkable progress lies in significant advancements in both pre-training and post-training of the model. Contrary to the popular belief that ...

IEEE

Local Alignment for Medical Vision-Language Pre-Training

Abstract: Establishing local semantic correspondences between medical images and their corresponding reports is crucial for effective medical vision-language pre-training. However, existing methods ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results