This is source code accompanying the paper of Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks by Han Wang, Gang Wang, and Huan Zhang. In ...
Language models are able to generate text, but when requiring a precise output format, they do not always perform as instructed. Various prompt engineering techniques have been introduced to improve ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results