The BoN Jailbreaking Rig is a human-in-the-loop platform designed for AI safety researchers to systematically test language model robustness against adversarial prompts. It combines automated prompt ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results