arXiv Analytics

Sign in

arXiv:1508.05154 [cs.CL]AbstractReferencesReviewsResources

Posterior calibration and exploratory analysis for natural language processing models

Khanh Nguyen, Brendan O'Connor

Published 2015-08-21Version 1

Many models in natural language processing define probabilistic distributions over linguistic structures. We argue that (1) the quality of a model' s posterior distribution can and should be directly evaluated, as to whether probabilities correspond to empirical frequencies, and (2) NLP uncertainty can be projected not only to pipeline components, but also to exploratory data analysis, telling a user when to trust and not trust the NLP analysis. We present a method to analyze calibration, and apply it to compare the miscalibration of several commonly used models. We also contribute a coreference sampling algorithm that can create confidence intervals for a political event extraction task.

Comments: 12 pages (including supplementary information) in EMNLP 2015
Categories: cs.CL
Related articles: Most relevant | Search more
arXiv:2306.00168 [cs.CL] (Published 2023-05-31)
Measuring the Robustness of Natural Language Processing Models to Domain Shifts
arXiv:2304.00235 [cs.CL] (Published 2023-04-01)
What Does the Indian Parliament Discuss? An Exploratory Analysis of the Question Hour in the Lok Sabha
arXiv:2404.02408 [cs.CL] (Published 2024-04-03)
CMULAB: An Open-Source Framework for Training and Deployment of Natural Language Processing Models