RewardHackWatch is an open-source reward hacking detection tool for LLM agents. It detects when AI agents learn to game their reward signals and tracks whether these behaviors generalize to broader ...
Container instances. Calling docker run on an OCI image results in the allocation of system resources to create a ...