Tests that once challenged advanced AI models are now being solved with ease, making it harder for researchers to pinpoint what current systems are actually capable of.