arXiv:2402.16617 Abstract | arXiv Analytics

arXiv:2402.16617 [cs.CL]Abstract References Reviews Resources

Long-Context Language Modeling with Parallel Context Encoding

Published 2024-02-26, updated 2024-06-11Version 2

Extending large language models (LLMs) to process longer inputs is crucial for a wide range of applications. However, the substantial computational cost of transformers and limited generalization of positional encoding restrict the size of their context window. We introduce Context Expansion with Parallel Encoding (CEPE), a framework that can be applied to any existing decoder-only LLMs to extend their context window. CEPE employs a small encoder to process long inputs chunk by chunk, enabling the frozen decoder to utilize additional contexts via cross-attention. CEPE is efficient, generalizable, and versatile: trained with 8K-token documents, it extends the context window of LLAMA-2 to 128K tokens, offering 10x the throughput with only 1/6 of the memory. CEPE yields strong performance on language modeling and in-context learning. CEPE also excels in retrieval-augmented applications, while existing long-context models degenerate with retrieved contexts. We further introduce a CEPE variant that can extend the context window of instruction-tuned models using only unlabeled data, and showcase its effectiveness on LLAMA-2-CHAT, leading to a strong instruction-following model that can leverage very long contexts on downstream tasks.

Comments: ACL 2024. Code, models, and data are available at https://github.com/princeton-nlp/CEPE. arXiv admin note: text overlap with arXiv:1912.01214 by other authors

Categories: cs.CL

Keywords: parallel context encoding, long-context language modeling, context window, cepe yields strong performance, process long inputs chunk

Tags: github project

Related articles: Most relevant | Search more

arXiv:2410.23771 [cs.CL] (Published 2024-10-31)

What is Wrong with Perplexity for Long-context Language Modeling?

Lizhe Fang et al.

arXiv:2410.01651 [cs.CL] (Published 2024-10-02, updated 2025-01-27)

Efficient Length-Generalizable Attention via Causal Retrieval for Long-Context Language Modeling

Xiang Hu, Zhihao Teng, Jun Zhao, Wei Wu, Kewei Tu

arXiv:2311.09136 [cs.CL] (Published 2023-11-15)

RRescue: Ranking LLM Responses to Enhance Reasoning Over Context

Yikun Wang, Rui Zheng, Haoming Li, Qi Zhang, Tao Gui, Fei Liu

arXiv Analytics

arXiv:2402.16617 [cs.CL]Abstract References Reviews Resources

Long-Context Language Modeling with Parallel Context Encoding

Links

Toolbox

arXiv:2402.16617 [cs.CL]AbstractReferencesReviewsResources

Long-Context Language Modeling with Parallel Context Encoding

Links

Toolbox

arXiv:2402.16617 [cs.CL]Abstract References Reviews Resources