The BoN Jailbreaking Rig is a human-in-the-loop platform designed for AI safety researchers to systematically test language model robustness against adversarial prompts. It combines automated prompt ...