Visual-language foundation models, like CLIP, learn generalized representations that enable zero-shot open-set clas-sification. Few-shot adaptation methods, based on prompt tuning, have been shown to ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results