Claude Opus 4.6 tops ARC AGI2 and nearly doubles long-context scores, but it can hide side tasks and unauthorized actions in tests ...
Altogether, £27m is now available to fund the AI Security Institute’s work to collaborate on safe, secure artificial intelligence.
Claude Sonnet 4.6 sets new alignment records with low misuse; Opus 4.6 still leads on fluid intelligence tests, risk framing ...
Results that may be inaccessible to you are currently showing.
Hide inaccessible results