Abstract: Half-precision matrix multiply has played a key role in the training of deep learning models. The newly designed Nvidia Tensor Cores offer the native instructions for half-precision small ...
Abstract: Sparse matrix multiplication is widely used in various practical applications. Different accelerators have been proposed to speed up sparse matrix-dense vector multiplication (SpMV), sparse ...