Abstract: Zero-shot skeleton-based action recognition aims to classify unseen skeleton-based human actions without prior exposure to such categories during training. This task is extremely challenging ...
Abstract: Image-text retrieval requires the system to bridge the heterogenous gap between vision and language for accurate retrieval while keeping the network lightweight-enough for efficient ...
A powerful, production-ready Streamlit web application for comprehensive LLM response evaluation and benchmarking. Features multi-dimensional scoring across 7 key criteria, interactive analytics ...
TL;DR: We propose ReAlign, a plug-and-play reward-guided alignment strategy for text-to-motion generation, which explicitly enhances both semantic consistency and motion realism throughout the ...