arXiv Analytics

Sign in

arXiv:2003.10296 [cs.CL]AbstractReferencesReviewsResources

Adaptive Name Entity Recognition under Highly Unbalanced Data

Thong Nguyen, Duy Nguyen, Pramod Rao

Published 2020-03-10Version 1

For several purposes in Natural Language Processing (NLP), such as Information Extraction, Sentiment Analysis or Chatbot, Named Entity Recognition (NER) holds an important role as it helps to determine and categorize entities in text into predefined groups such as the names of persons, locations, quantities, organizations or percentages, etc. In this report, we present our experiments on a neural architecture composed of a Conditional Random Field (CRF) layer stacked on top of a Bi-directional LSTM (BI-LSTM) layer for solving NER tasks. Besides, we also employ a fusion input of embedding vectors (Glove, BERT), which are pre-trained on the huge corpus to boost the generalization capacity of the model. Unfortunately, due to the heavy unbalanced distribution cross-training data, both approaches just attained a bad performance on less training samples classes. To overcome this challenge, we introduce an add-on classification model to split sentences into two different sets: Weak and Strong classes and then designing a couple of Bi-LSTM-CRF models properly to optimize performance on each set. We evaluated our models on the test set and discovered that our method can improve performance for Weak classes significantly by using a very small data set (approximately 0.45\%) compared to the rest classes.

Related articles: Most relevant | Search more
arXiv:1909.09292 [cs.CL] (Published 2019-09-20)
BERT Meets Chinese Word Segmentation
arXiv:2011.00425 [cs.CL] (Published 2020-11-01)
Analyzing the Effect of Multi-task Learning for Biomedical Named Entity Recognition
arXiv:1612.02482 [cs.CL] (Published 2016-12-07)
Improving the Performance of Neural Machine Translation Involving Morphologically Rich Languages