arXiv:1808.08755 Abstract | arXiv Analytics

arXiv:1808.08755 [cs.LG]Abstract References Reviews Resources

Learning from Positive and Unlabeled Data under the Selected At Random Assumption

Published 2018-08-27Version 1

For many interesting tasks, such as medical diagnosis and web page classification, a learner only has access to some positively labeled examples and many unlabeled examples. Learning from this type of data requires making assumptions about the true distribution of the classes and/or the mechanism that was used to select the positive examples to be labeled. The commonly made assumptions, separability of the classes and positive examples being selected completely at random, are very strong. This paper proposes a weaker assumption that assumes the positive examples to be selected at random, conditioned on some of the attributes. To learn under this assumption, an EM method is proposed. Experiments show that our method is not only very capable of learning under this assumption, but it also outperforms the state of the art for learning under the selected completely at random assumption.

Categories: cs.LG, stat.ML

Keywords: random assumption, unlabeled data, positive examples, web page classification, true distribution

Related articles: Most relevant | Search more

arXiv:1809.03207 [cs.LG] (Published 2018-09-10)

Beyond the Selected Completely At Random Assumption for Learning from Positive and Unlabeled Data

Jessa Bekker, Jesse Davis

arXiv:1904.11717 [cs.LG] (Published 2019-04-26)

Classification from Pairwise Similarities/Dissimilarities and Unlabeled Data via Empirical Risk Minimization

Takuya Shimada, Han Bao, Issei Sato, Masashi Sugiyama

arXiv:1911.08696 [cs.LG] (Published 2019-11-20)

Where is the Bottleneck of Adversarial Learning with Unlabeled Data?

Jingfeng Zhang, Bo Han, Gang Niu, Tongliang Liu, Masashi Sugiyama