Abstract: Deep learning models for action recognition remain challenged by scene biases as they prioritize easily learnable scene representations over the actual actor’s motion patterns. To address ...
Abstract: Human–Object Interaction Detection (HOID) has benefited greatly from advances in modern detection architectures and vision-language foundation models. In this paper, we present two ...