arXiv Analytics

Sign in

arXiv:1601.03288 [cs.CL]AbstractReferencesReviewsResources

Predicting the Effectiveness of Self-Training: Application to Sentiment Classification

Vincent Van Asch, Walter Daelemans

Published 2016-01-13Version 1

The goal of this paper is to investigate the connection between the performance gain that can be obtained by selftraining and the similarity between the corpora used in this approach. Self-training is a semi-supervised technique designed to increase the performance of machine learning algorithms by automatically classifying instances of a task and adding these as additional training material to the same classifier. In the context of language processing tasks, this training material is mostly an (annotated) corpus. Unfortunately self-training does not always lead to a performance increase and whether it will is largely unpredictable. We show that the similarity between corpora can be used to identify those setups for which self-training can be beneficial. We consider this research as a step in the process of developing a classifier that is able to adapt itself to each new test corpus that it is presented with.

Related articles: Most relevant | Search more
arXiv:2301.09912 [cs.CL] (Published 2023-01-24)
Applications and Challenges of Sentiment Analysis in Real-life Scenarios
arXiv:1412.6264 [cs.CL] (Published 2014-12-19)
Supertagging: Introduction, learning, and application
arXiv:2211.15328 [cs.CL] (Published 2022-11-28)
A Survey on Conversational Search and Applications in Biomedicine