arXiv:2006.10518 Abstract | arXiv Analytics

arXiv:2006.10518 [cs.LG]Abstract References Reviews Resources

Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming

Itay Hubara, Yury Nahshan, Yair Hanani, Ron Banner, Daniel Soudry

Published 2020-06-14Version 1

Most of the literature on neural network quantization requires some training of the quantized model (fine-tuning). However, this training is not always possible in real-world scenarios, as it requires the full dataset. Lately, post-training quantization methods have gained considerable attention, as they are simple to use and require only a small, unlabeled calibration set. Yet, they usually incur significant accuracy degradation when quantized below 8-bits. This paper seeks to address this problem by introducing two pipelines, advanced and light, where the former involves: (i) minimizing the quantization errors of each layer by optimizing its parameters over the calibration set; (ii) using integer programming to optimally allocate the desired bit-width for each layer while constraining accuracy degradation or model compression; and (iii) tuning the mixed-precision model statistics to correct biases introduced during quantization. While the light pipeline which invokes only (ii) and (iii) obtains surprisingly accurate results; the advanced pipeline yields state-of-the-art accuracy-compression ratios for both vision and text models. For instance, on ResNet50, we obtain less than 1% accuracy degradation while compressing the model to 13% of its original size. We open-sourced our code.

Categories: cs.LG, stat.ML

Keywords: improving post training neural quantization, integer programming, pipeline yields state-of-the-art accuracy-compression, incur significant accuracy degradation, yields state-of-the-art accuracy-compression ratios

Related articles: Most relevant | Search more

arXiv:2002.04679 [cs.LG] (Published 2020-02-11)

IPBoost -- Non-Convex Boosting via Integer Programming

Marc E. Pfetsch, Sebastian Pokutta

arXiv:2410.18147 [cs.LG] (Published 2024-10-22)

MEC-IP: Efficient Discovery of Markov Equivalent Classes via Integer Programming

Abdelmonem Elrefaey, Rong Pan

arXiv:1906.04859 [cs.LG] (Published 2019-06-11)

Reinforcement Learning for Integer Programming: Learning to Cut

Yunhao Tang, Shipra Agrawal, Yuri Faenza