arXiv:1711.03953 Abstract | arXiv Analytics

arXiv:1711.03953 [cs.CL]Abstract References Reviews Resources

Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

Zhilin Yang, Zihang Dai, Ruslan Salakhutdinov, William W. Cohen

Published 2017-11-10Version 1

We formulate language modeling as a matrix factorization problem, and show that the expressiveness of Softmax-based models (including the majority of neural language models) is limited by a Softmax bottleneck. Given that natural language is highly context-dependent, this further implies that in practice Softmax with distributed word embeddings does not have enough capacity to model natural language. We propose a simple and effective method to address this issue, and improve the state-of-the-art perplexities on Penn Treebank and WikiText-2 to 47.69 and 40.68 respectively.

Categories: cs.CL, cs.LG

Keywords: high-rank rnn language model, softmax bottleneck, matrix factorization problem, neural language models, model natural language

Related articles: Most relevant | Search more

arXiv:1708.00781 [cs.CL] (Published 2017-08-02)

Dynamic Entity Representations in Neural Language Models

Yangfeng Ji, Chenhao Tan, Sebastian Martschat, Yejin Choi, Noah A. Smith

arXiv:1811.00998 [cs.CL] (Published 2018-11-02)

Analysing Dropout and Compounding Errors in Neural Language Models

James O' Neill, Danushka Bollegala

arXiv:1901.00398 [cs.CL] (Published 2019-01-02)

Judge the Judges: A Large-Scale Evaluation Study of Neural Language Models for Online Review Generation

Cristina Garbacea, Samuel Carton, Shiyan Yan, Qiaozhu Mei