arXiv Analytics

Sign in

arXiv:1402.7025 [cs.LG]AbstractReferencesReviewsResources

Exploiting the Statistics of Learning and Inference

Max Welling

Published 2014-02-26, updated 2014-03-04Version 2

When dealing with datasets containing a billion instances or with simulations that require a supercomputer to execute, computational resources become part of the equation. We can improve the efficiency of learning and inference by exploiting their inherent statistical nature. We propose algorithms that exploit the redundancy of data relative to a model by subsampling data-cases for every update and reasoning about the uncertainty created in this process. In the context of learning we propose to test for the probability that a stochastically estimated gradient points more than 180 degrees in the wrong direction. In the context of MCMC sampling we use stochastic gradients to improve the efficiency of MCMC updates, and hypothesis tests based on adaptive mini-batches to decide whether to accept or reject a proposed parameter update. Finally, we argue that in the context of likelihood free MCMC one needs to store all the information revealed by all simulations, for instance in a Gaussian process. We conclude that Bayesian methods will remain to play a crucial role in the era of big data and big simulations, but only if we overcome a number of computational challenges.

Comments: Proceedings of the NIPS workshop on "Probabilistic Models for Big Data"
Categories: cs.LG
Related articles: Most relevant | Search more
arXiv:1909.01736 [cs.LG] (Published 2019-09-03)
Beyond Human-Level Accuracy: Computational Challenges in Deep Learning
arXiv:2302.07986 [cs.LG] (Published 2023-02-15)
On the Detection and Quantification of Nonlinearity via Statistics of the Gradients of a Black-Box Model
arXiv:2109.05280 [cs.LG] (Published 2021-09-11)
Learning To Describe Player Form in The MLB