{ "id": "2006.10562", "version": "v1", "published": "2020-06-18T14:11:27.000Z", "updated": "2020-06-18T14:11:27.000Z", "title": "Uncertainty in Gradient Boosting via Ensembles", "authors": [ "Aleksei Ustimenko", "Liudmila Prokhorenkova", "Andrey Malinin" ], "categories": [ "cs.LG", "stat.ML" ], "abstract": "Gradient boosting is a powerful machine learning technique that is particularly successful for tasks containing heterogeneous features and noisy data. While gradient boosting classification models return a distribution over class labels, regressions models typically yield only point predictions. However, for many practical, high-risk applications, it is also important to be able to quantify uncertainty in the predictions to avoid costly mistakes. In this work, we examine a probabilistic ensemble-based framework for deriving uncertainty estimates in the predictions of gradient boosting classification and regression models. Crucially, the proposed approach allows the total uncertainty to be decomposed into \\textit{data uncertainty}, which comes from the complexity and noise in data distribution, and \\textit{knowledge uncertainty}, coming from the lack of information about a given region of the feature space. Two approaches for generating ensembles are considered: Stochastic Gradient Boosting (SGB) and Stochastic Gradient Langevin Boosting (SGLB). Notably, SGLB also enables the generation of a \\emph{virtual} ensemble via only one gradient boosting model, which significantly reduces complexity. Experiments on a range of regression and classification datasets show that ensembles of gradient boosting models yield improved predictive performance, and measures of uncertainty successfully enable detection of out-of-domain inputs.", "revisions": [ { "version": "v1", "updated": "2020-06-18T14:11:27.000Z" } ], "analyses": { "keywords": [ "uncertainty", "gradient boosting classification models return", "predictions", "stochastic gradient langevin", "regressions models typically yield" ], "note": { "typesetting": "TeX", "pages": 0, "language": "en", "license": "arXiv", "status": "editable" } } }