Abstract: Increasingly large-scale models and rich data sets make communication overhead a key bottleneck for distributed Deep Neural Network (DNN) training, constantly attracting the attention of ...