arXiv:2309.10668 Abstract | arXiv Analytics

arXiv:2309.10668 [cs.LG]Abstract References Reviews Resources

Language Modeling Is Compression

Grégoire Delétang, Anian Ruoss, Paul-Ambroise Duquenne, Elliot Catt, Tim Genewein, Christopher Mattern, Jordi Grau-Moya, Li Kevin Wenliang, Matthew Aitchison, Laurent Orseau, Marcus Hutter, Joel Veness

Published 2023-09-19Version 1

It has long been established that predictive models can be transformed into lossless compressors and vice versa. Incidentally, in recent years, the machine learning community has focused on training increasingly large and powerful self-supervised (language) models. Since these large language models exhibit impressive predictive capabilities, they are well-positioned to be strong compressors. In this work, we advocate for viewing the prediction problem through the lens of compression and evaluate the compression capabilities of large (foundation) models. We show that large language models are powerful general-purpose predictors and that the compression viewpoint provides novel insights into scaling laws, tokenization, and in-context learning. For example, Chinchilla 70B, while trained primarily on text, compresses ImageNet patches to 43.4% and LibriSpeech samples to 16.4% of their raw size, beating domain-specific compressors like PNG (58.5%) or FLAC (30.3%), respectively. Finally, we show that the prediction-compression equivalence allows us to use any compressor (like gzip) to build a conditional generative model.

Categories: cs.LG, cs.AI, cs.CL, cs.IT, math.IT

Keywords: large language models, language modeling, compresses imagenet patches, compression viewpoint, strong compressors

Related articles: Most relevant | Search more

arXiv:2305.05176 [cs.LG] (Published 2023-05-09)

FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance

Lingjiao Chen, Matei Zaharia, James Zou

arXiv:2109.08668 [cs.LG] (Published 2021-09-17)

Primer: Searching for Efficient Transformers for Language Modeling

David R. So, Wojciech Mańke, Hanxiao Liu, Zihang Dai, Noam Shazeer, Quoc V. Le

arXiv:2308.08614 [cs.LG] (Published 2023-08-16)

Boosting Logical Reasoning in Large Language Models through a New Framework: The Graph of Thought

Bin Lei, pei-Hung Lin, Chunhua Liao, Caiwen Ding