arXiv:2211.02069 Abstract | arXiv Analytics

arXiv:2211.02069 [cs.CL]Abstract References Reviews Resources

LMentry: A Language Model Benchmark of Elementary Language Tasks

Published 2022-11-03Version 1

As the performance of large language models rapidly improves, benchmarks are getting larger and more complex as well. We present LMentry, a benchmark that avoids this "arms race" by focusing on a compact set of tasks that are trivial to humans, e.g. writing a sentence containing a specific word, identifying which words in a list belong to a specific category, or choosing which of two words is longer. LMentry is specifically designed to provide quick and interpretable insights into the capabilities and robustness of large language models. Our experiments reveal a wide variety of failure cases that, while immediately obvious to humans, pose a considerable challenge for large language models, including OpenAI's latest 175B-parameter instruction-tuned model, TextDavinci002. LMentry complements contemporary evaluation approaches of large language models, providing a quick, automatic, and easy-to-run "unit test", without resorting to large benchmark suites of complex tasks.

Comments: 24 pages, 2 figures

Categories: cs.CL, cs.AI, cs.LG

Keywords: large language models, elementary language tasks, language model benchmark, latest 175b-parameter instruction-tuned model, lmentry complements contemporary evaluation approaches

Related articles: Most relevant | Search more

arXiv:2202.00828 [cs.CL] (Published 2022-02-02)

Co-training Improves Prompt-based Learning for Large Language Models

Hunter Lang, Monica Agrawal, Yoon Kim, David Sontag

arXiv:2205.08184 [cs.CL] (Published 2022-05-17)

SKILL: Structured Knowledge Infusion for Large Language Models

Fedor Moiseev, Zhe Dong, Enrique Alfonseca, Martin Jaggi

arXiv:2211.05110 [cs.CL] (Published 2022-11-09)

Large Language Models with Controllable Working Memory

Daliang Li et al.

arXiv Analytics

arXiv:2211.02069 [cs.CL]Abstract References Reviews Resources

LMentry: A Language Model Benchmark of Elementary Language Tasks

Links

Toolbox

arXiv:2211.02069 [cs.CL]AbstractReferencesReviewsResources

LMentry: A Language Model Benchmark of Elementary Language Tasks

Links

Toolbox

arXiv:2211.02069 [cs.CL]Abstract References Reviews Resources