arXiv:2306.11222 Abstract | arXiv Analytics

arXiv:2306.11222 [cs.LG]Abstract References Reviews Resources

LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

Yixiao Li, Yifan Yu, Qingru Zhang, Chen Liang, Pengcheng He, Weizhu Chen, Tuo Zhao

Published 2023-06-20Version 1

Transformer models have achieved remarkable results in various natural language tasks, but they are often prohibitively large, requiring massive memories and computational resources. To reduce the size and complexity of these models, we propose LoSparse (Low-Rank and Sparse approximation), a novel model compression technique that approximates a weight matrix by the sum of a low-rank matrix and a sparse matrix. Our method combines the advantages of both low-rank approximations and pruning, while avoiding their limitations. Low-rank approximation compresses the coherent and expressive parts in neurons, while pruning removes the incoherent and non-expressive parts in neurons. Pruning enhances the diversity of low-rank approximations, and low-rank approximation prevents pruning from losing too many expressive neurons. We evaluate our method on natural language understanding, question answering, and natural language generation tasks. We show that it significantly outperforms existing compression methods.

Categories: cs.LG, cs.CL

Keywords: large language models, sparse approximation, low-rank approximation, structured compression, natural language generation tasks

Related articles: Most relevant | Search more

arXiv:2303.02206 [cs.LG] (Published 2023-03-03, updated 2023-08-23)

Domain Specific Question Answering Over Knowledge Graphs Using Logical Programming and Large Language Models

Navid Madani, Rohini K. Srihari, Kenneth Joseph

arXiv:2306.07567 [cs.LG] (Published 2023-06-13)

Large Language Models Sometimes Generate Purely Negatively-Reinforced Text

Fabien Roger

arXiv:2306.04634 [cs.LG] (Published 2023-06-07)

On the Reliability of Watermarks for Large Language Models

John Kirchenbauer et al.

arXiv Analytics

arXiv:2306.11222 [cs.LG]Abstract References Reviews Resources

LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

Links

Toolbox

arXiv:2306.11222 [cs.LG]AbstractReferencesReviewsResources

LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

Links

Toolbox

arXiv:2306.11222 [cs.LG]Abstract References Reviews Resources