arXiv:2003.10296 Abstract | arXiv Analytics

arXiv:2003.10296 [cs.CL]Abstract References Reviews Resources

Adaptive Name Entity Recognition under Highly Unbalanced Data

Published 2020-03-10Version 1

For several purposes in Natural Language Processing (NLP), such as Information Extraction, Sentiment Analysis or Chatbot, Named Entity Recognition (NER) holds an important role as it helps to determine and categorize entities in text into predefined groups such as the names of persons, locations, quantities, organizations or percentages, etc. In this report, we present our experiments on a neural architecture composed of a Conditional Random Field (CRF) layer stacked on top of a Bi-directional LSTM (BI-LSTM) layer for solving NER tasks. Besides, we also employ a fusion input of embedding vectors (Glove, BERT), which are pre-trained on the huge corpus to boost the generalization capacity of the model. Unfortunately, due to the heavy unbalanced distribution cross-training data, both approaches just attained a bad performance on less training samples classes. To overcome this challenge, we introduce an add-on classification model to split sentences into two different sets: Weak and Strong classes and then designing a couple of Bi-LSTM-CRF models properly to optimize performance on each set. We evaluated our models on the test set and discovered that our method can improve performance for Weak classes significantly by using a very small data set (approximately 0.45\%) compared to the rest classes.

Categories: cs.CL, cs.AI, cs.LG, stat.ML

Keywords: highly unbalanced data, performance, add-on classification model, small data set, heavy unbalanced distribution cross-training data

Related articles: Most relevant | Search more

arXiv:1909.09292 [cs.CL] (Published 2019-09-20)

BERT Meets Chinese Word Segmentation

Haiqin Yang

arXiv:2011.00425 [cs.CL] (Published 2020-11-01)

Analyzing the Effect of Multi-task Learning for Biomedical Named Entity Recognition

Arda Akdemir, Tetsuo Shibuya

arXiv:1612.02482 [cs.CL] (Published 2016-12-07)

Improving the Performance of Neural Machine Translation Involving Morphologically Rich Languages

Krupakar Hans, R S Milton

arXiv Analytics

arXiv:2003.10296 [cs.CL]Abstract References Reviews Resources

Adaptive Name Entity Recognition under Highly Unbalanced Data

Links

Toolbox

arXiv:2003.10296 [cs.CL]AbstractReferencesReviewsResources

Adaptive Name Entity Recognition under Highly Unbalanced Data

Links

Toolbox

arXiv:2003.10296 [cs.CL]Abstract References Reviews Resources