arXiv Analytics

Sign in

arXiv:2007.15707 [stat.ML]AbstractReferencesReviewsResources

Solar: a least-angle regression for accurate and stable variable selection in high-dimensional data

Ning Xu, Timothy C. G. Fisher, Jian Hong

Published 2020-07-30Version 1

We propose a new least-angle regression algorithm for variable selection in high-dimensional data, called \emph{subsample-ordered least-angle regression (solar)}. Solar relies on the average $L_0$ solution path computed across subsamples and largely alleviates several known high-dimensional issues with least-angle regression. Using examples based on directed acyclic graphs, we illustrate the advantages of solar in comparison to least-angle regression, forward regression and variable screening. Simulations demonstrate that, with a similar computation load, solar yields substantial improvements over two lasso solvers (least-angle regression for lasso and coordinate-descent) in terms of the sparsity (37-64\% reduction in the average number of selected variables), stability and accuracy of variable selection. Simulations also demonstrate that solar enhances the robustness of variable selection to different settings of the irrepresentable condition and to variations in the dependence structures assumed in regression analysis. We provide a Python package \texttt{solarpy} for the algorithm.

Related articles: Most relevant | Search more
arXiv:2207.00367 [stat.ML] (Published 2022-07-01)
A geometric framework for outlier detection in high-dimensional data
arXiv:2103.13787 [stat.ML] (Published 2021-03-25)
Interpretable Approximation of High-Dimensional Data
arXiv:2101.09174 [stat.ML] (Published 2021-01-22)
Sparsistent filtering of comovement networks from high-dimensional data