Abstract: We present UniAlign, a unified model to align an arbitrary number of modalities (e.g., image, text, audio, 3D point cloud, etc.) through one encoder and a single training phase. Existing ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results