{
  "id": "2209.09815",
  "version": "v1",
  "published": "2022-09-20T16:02:28.000Z",
  "updated": "2022-09-20T16:02:28.000Z",
  "title": "Integer Fine-tuning of Transformer-based Models",
  "authors": [
    "Mohammadreza Tayaranian",
    "Alireza Ghaffari",
    "Marzieh S. Tahaei",
    "Mehdi Rezagholizadeh",
    "Masoud Asgharian",
    "Vahid Partovi Nia"
  ],
  "categories": [
    "cs.LG"
  ],
  "abstract": "Transformer based models are used to achieve state-of-the-art performance on various deep learning tasks. Since transformer-based models have large numbers of parameters, fine-tuning them on downstream tasks is computationally intensive and energy hungry. Automatic mixed-precision FP32/FP16 fine-tuning of such models has been previously used to lower the compute resource requirements. However, with the recent advances in the low-bit integer back-propagation, it is possible to further reduce the computation and memory foot-print. In this work, we explore a novel integer training method that uses integer arithmetic for both forward propagation and gradient computation of linear, convolutional, layer-norm, and embedding layers in transformer-based models. Furthermore, we study the effect of various integer bit-widths to find the minimum required bit-width for integer fine-tuning of transformer-based models. We fine-tune BERT and ViT models on popular downstream tasks using integer layers. We show that 16-bit integer models match the floating-point baseline performance. Reducing the bit-width to 10, we observe 0.5 average score drop. Finally, further reduction of the bit-width to 8 provides an average score drop of 1.7 points.",
  "revisions": [
    {
      "version": "v1",
      "updated": "2022-09-20T16:02:28.000Z"
    }
  ],
  "analyses": {
    "keywords": [
      "transformer-based models",
      "integer fine-tuning",
      "average score drop",
      "novel integer training method",
      "automatic mixed-precision fp32/fp16"
    ],
    "note": {
      "typesetting": "TeX",
      "pages": 0,
      "language": "en",
      "license": "arXiv",
      "status": "editable"
    }
  }
}