Abstract: The rapid deployment of intelligent applications on edge cloud calls for efficient and responsive deep neural network inference, especially under the burst scenarios of inference request.