ScreenSpot is an evaluation benchmark for GUI grounding, comprising over 1200 instructions from iOS, Android, macOS, Windows and Web environments, along with annotated element types (Text or ...
A fundamental challenge for GUI agents is robustly grounding natural language instructions, which requires not only precise spatial alignment (locating elements accurately) but also correct semantic ...