Abstract: Large Language Models (LLMs) have demonstrated remarkable proficiency in generating text and producing fluent, succinct, and precise linguistic expressions. Limited battery life and ...
Doe-1 is the first closed-loop autonomous driving model for unified perception, prediction, and planning. We formulate autonomous driving as a unified next-token generation problem and use observation ...
Given a set of bounding boxes with associated trajectories, our framework enables object and camera motion control in image-to-video generation by leveraging the knowledge present in a pre-trained ...