arXiv:2209.09815 Abstract | arXiv Analytics

arXiv:2209.09815 [cs.LG]Abstract References Reviews Resources

Integer Fine-tuning of Transformer-based Models

Mohammadreza Tayaranian, Alireza Ghaffari, Marzieh S. Tahaei, Mehdi Rezagholizadeh, Masoud Asgharian, Vahid Partovi Nia

Published 2022-09-20Version 1

Transformer based models are used to achieve state-of-the-art performance on various deep learning tasks. Since transformer-based models have large numbers of parameters, fine-tuning them on downstream tasks is computationally intensive and energy hungry. Automatic mixed-precision FP32/FP16 fine-tuning of such models has been previously used to lower the compute resource requirements. However, with the recent advances in the low-bit integer back-propagation, it is possible to further reduce the computation and memory foot-print. In this work, we explore a novel integer training method that uses integer arithmetic for both forward propagation and gradient computation of linear, convolutional, layer-norm, and embedding layers in transformer-based models. Furthermore, we study the effect of various integer bit-widths to find the minimum required bit-width for integer fine-tuning of transformer-based models. We fine-tune BERT and ViT models on popular downstream tasks using integer layers. We show that 16-bit integer models match the floating-point baseline performance. Reducing the bit-width to 10, we observe 0.5 average score drop. Finally, further reduction of the bit-width to 8 provides an average score drop of 1.7 points.

Categories: cs.LG

Keywords: transformer-based models, integer fine-tuning, average score drop, novel integer training method, automatic mixed-precision fp32/fp16

Related articles: Most relevant | Search more

arXiv:2407.15414 [cs.LG] (Published 2024-07-22)

Weights Shuffling for Improving DPSGD in Transformer-based Models

Jungang Yang, Zhe Ji, Liyao Xiang

arXiv:2304.10891 [cs.LG] (Published 2023-04-21)

Transformer-based models and hardware acceleration analysis in autonomous driving: A survey

Juan Zhong, Zheng Liu, Xi Chen

arXiv:2311.13624 [cs.LG] (Published 2023-11-22)

A Theoretical Insight into Attack and Defense of Gradient Leakage in Transformer

Chenyang Li, Zhao Song, Weixin Wang, Chiwun Yang