Abstract: Thanks to the inherent spatial or sequential structures underlying the data like images and texts, deep architectures such as convolutional neural networks (CNNs) and the Transformer have ...