arXiv:2211.05853 Abstract | arXiv Analytics

arXiv:2211.05853 [cs.CL]Abstract References Reviews Resources

Measuring Reliability of Large Language Models through Semantic Consistency

Harsh Raj, Domenic Rosati, Subhabrata Majumdar

Published 2022-11-10Version 1

While large pretrained language models (PLMs) demonstrate incredible fluency and performance on many natural language tasks, recent work has shown that well-performing PLMs are very sensitive to what prompts are feed into them. Even when prompts are semantically identical, language models may give very different answers. When considering safe and trustworthy deployments of PLMs we would like their outputs to be consistent under prompts that mean the same thing or convey the same intent. While some work has looked into how state-of-the-art PLMs address this need, they have been limited to only evaluating lexical equality of single- or multi-word answers and do not address consistency of generative text sequences. In order to understand consistency of PLMs under text generation settings, we develop a measure of semantic consistency that allows the comparison of open-ended text outputs. We implement several versions of this consistency metric to evaluate the performance of a number of PLMs on paraphrased versions of questions in the TruthfulQA dataset, we find that our proposed metrics are considerably more consistent than traditional metrics embodying lexical consistency, and also correlate with human evaluation of output consistency to a higher degree.

Comments: Accepted and presented in NeurIPS 2022 ML Safety Workshop, https://neurips2022.mlsafety.org

Categories: cs.CL, cs.AI, cs.CY

Keywords: large language models, semantic consistency, measuring reliability, large pretrained language models, traditional metrics embodying lexical consistency

Related articles: Most relevant | Search more

arXiv:2205.00445 [cs.CL] (Published 2022-05-01)

MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning

Ehud Karpas et al.

arXiv:2206.11484 [cs.CL] (Published 2022-06-23)

Towards WinoQueer: Developing a Benchmark for Anti-Queer Bias in Large Language Models

Virginia K. Felkner, Ho-Chun Herbert Chang, Eugene Jang, Jonathan May

arXiv:2211.10438 [cs.CL] (Published 2022-11-18)

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Guangxuan Xiao, Ji Lin, Mickael Seznec, Julien Demouth, Song Han

arXiv Analytics

arXiv:2211.05853 [cs.CL]Abstract References Reviews Resources

Measuring Reliability of Large Language Models through Semantic Consistency

Links

Toolbox

arXiv:2211.05853 [cs.CL]AbstractReferencesReviewsResources

Measuring Reliability of Large Language Models through Semantic Consistency

Links

Toolbox

arXiv:2211.05853 [cs.CL]Abstract References Reviews Resources