{ "id": "2301.10226", "version": "v1", "published": "2023-01-24T18:52:59.000Z", "updated": "2023-01-24T18:52:59.000Z", "title": "A Watermark for Large Language Models", "authors": [ "John Kirchenbauer", "Jonas Geiping", "Yuxin Wen", "Jonathan Katz", "Ian Miers", "Tom Goldstein" ], "comment": "12 pages in the main body. Code will be available at github.com/jwkirchenbauer/lm-watermarking", "categories": [ "cs.LG", "cs.CL", "cs.CR" ], "abstract": "Potential harms of large language models can be mitigated by watermarking model output, i.e., embedding signals into generated text that are invisible to humans but algorithmically detectable from a short span of tokens. We propose a watermarking framework for proprietary language models. The watermark can be embedded with negligible impact on text quality, and can be detected using an efficient open-source algorithm without access to the language model API or parameters. The watermark works by selecting a randomized set of whitelist tokens before a word is generated, and then softly promoting use of whitelist tokens during sampling. We propose a statistical test for detecting the watermark with interpretable p-values, and derive an information-theoretic framework for analyzing the sensitivity of the watermark. We test the watermark using a multi-billion parameter model from the Open Pretrained Transformer (OPT) family, and discuss robustness and security.", "revisions": [ { "version": "v1", "updated": "2023-01-24T18:52:59.000Z" } ], "analyses": { "keywords": [ "large language models", "whitelist tokens", "efficient open-source algorithm", "multi-billion parameter model", "language model api" ], "tags": [ "github project" ], "note": { "typesetting": "TeX", "pages": 12, "language": "en", "license": "arXiv", "status": "editable" } } }