arXiv Analytics

Sign in

arXiv:1811.01564 [cs.LG]AbstractReferencesReviewsResources

Parallel training of linear models without compromising convergence

Nikolas Ioannou, Celestine Dünner, Kornilios Kourtis, Thomas Parnell

Published 2018-11-05Version 1

In this paper we analyze, evaluate, and improve the performance of training generalized linear models on modern CPUs. We start with a state-of-the-art asynchronous parallel training algorithm, identify system-level performance bottlenecks, and apply optimizations that improve data parallelism, cache line locality, and cache line prefetching of the algorithm. These modifications reduce the per-epoch run-time significantly, but take a toll on algorithm convergence in terms of the required number of epochs. To alleviate these shortcomings of our systems-optimized version, we propose a novel, dynamic data partitioning scheme across threads which allows us to approach the convergence of the sequential version. The combined set of optimizations result in a consistent bottom line speedup in convergence of up to $\times12$ compared to the initial asynchronous parallel training algorithm and up to $\times42$, compared to state of the art implementations (scikit-learn and h2o) on a range of multi-core CPU architectures.

Related articles: Most relevant | Search more
arXiv:1008.5325 [cs.LG] (Published 2010-08-31, updated 2011-03-21)
Inference with Multivariate Heavy-Tails in Linear Models
arXiv:2211.15661 [cs.LG] (Published 2022-11-28, updated 2022-11-29)
What learning algorithm is in-context learning? Investigations with linear models
arXiv:2106.15093 [cs.LG] (Published 2021-06-29)
Certifiable Machine Unlearning for Linear Models