arXiv Analytics

Sign in

arXiv:1910.04732 [cs.CL]AbstractReferencesReviewsResources

Structured Pruning of Large Language Models

Ziheng Wang, Jeremy Wohlwend, Tao Lei

Published 2019-10-10Version 1

Large language models have recently achieved state of the art performance across a wide variety of natural language tasks. Meanwhile, the size of these models and their latency have significantly increased, which makes their usage costly, and raises an interesting question: do language models need to be large? We study this question through the lens of model compression. We present a novel, structured pruning approach based on low rank factorization and augmented Lagrangian L0 norm regularization. Our structured approach achieves significant inference speedups while matching or outperforming our unstructured pruning baseline at various sparsity levels. We apply our method to state of the art models on the enwiki8 dataset and obtain a 1.19 perplexity score with just 5M parameters, vastly outperforming a model of the same size trained from scratch. We also demonstrate that our method can be applied to language model fine-tuning by pruning the BERT model on several downstream classification benchmarks.

Related articles: Most relevant | Search more
arXiv:2212.06040 [cs.CL] (Published 2022-11-14)
Semantic Decomposition Improves Learning of Large Language Models on EHR Data
arXiv:2101.05783 [cs.CL] (Published 2021-01-14)
Persistent Anti-Muslim Bias in Large Language Models
arXiv:2102.02503 [cs.CL] (Published 2021-02-04)
Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models