arXiv:2310.02041 Abstract | arXiv Analytics

arXiv:2310.02041 [cs.LG]Abstract References Reviews Resources

The Inhibitor: ReLU and Addition-Based Attention for Efficient Transformers

Published 2023-10-03Version 1

To enhance the computational efficiency of quantized Transformers, we replace the dot-product and Softmax-based attention with an alternative mechanism involving addition and ReLU activation only. This side-steps the expansion to double precision often required by matrix multiplication and avoids costly Softmax evaluations but maintains much of the core functionality of conventional dot-product attention. It can enable more efficient execution and support larger quantized Transformer models on resource-constrained hardware or alternative arithmetic systems like homomorphic encryption. Training experiments on four common benchmark tasks show test set prediction scores comparable to those of conventional Transformers with dot-product attention. Our scaling experiments also suggest significant computational savings, both in plaintext and under encryption. In particular, we believe that the ReLU and addition-based attention mechanism introduced in this paper may enable privacy-preserving AI applications operating under homomorphic encryption by avoiding the costly multiplication of encrypted variables.

Comments: 8 pages, 3 tables

Categories: cs.LG

Subjects: 68T07, 68T07, I.2.6

Keywords: addition-based attention, efficient transformers, set prediction scores comparable, privacy-preserving ai applications operating, support larger quantized transformer models

Related articles: Most relevant | Search more

arXiv:2009.06732 [cs.LG] (Published 2020-09-14)

Efficient Transformers: A Survey

Yi Tay, Mostafa Dehghani, Dara Bahri, Donald Metzler

arXiv:2109.08668 [cs.LG] (Published 2021-09-17)

Primer: Searching for Efficient Transformers for Language Modeling

David R. So, Wojciech Mańke, Hanxiao Liu, Zihang Dai, Noam Shazeer, Quoc V. Le

arXiv:2301.13310 [cs.LG] (Published 2023-01-30)