Visual-language foundation models, like CLIP, learn generalized representations that enable zero-shot open-set clas-sification. Few-shot adaptation methods, based on prompt tuning, have been shown to ...