Abstract: Quantizing neural network is an efficient model compression technique that converts weights and activations from floating-point to integer. However, existing model quantization methods are ...
Abstract: Post-training quantization(PTQ) has been widely studied in recent years because it does not require retraining the network or the entire training dataset. However, naively applying the PTQ ...