arXiv:2310.17247 Abstract | arXiv Analytics

arXiv:2310.17247 [cs.LG]Abstract References Reviews Resources

Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity

Published 2023-10-26Version 1

In some settings neural networks exhibit a phenomenon known as grokking, where they achieve perfect or near-perfect accuracy on the validation set long after the same performance has been achieved on the training set. In this paper, we discover that grokking is not limited to neural networks but occurs in other settings such as Gaussian process (GP) classification, GP regression and linear regression. We also uncover a mechanism by which to induce grokking on algorithmic datasets via the addition of dimensions containing spurious information. The presence of the phenomenon in non-neural architectures provides evidence that grokking is not specific to SGD or weight norm regularisation. Instead, grokking may be possible in any setting where solution search is guided by complexity and error. Based on this insight and further trends we see in the training trajectories of a Bayesian neural network (BNN) and GP regression model, we make progress towards a more general theory of grokking. Specifically, we hypothesise that the phenomenon is governed by the accessibility of certain regions in the error and complexity landscapes.

Categories: cs.LG, stat.ML

Keywords: model complexity, empirical exploration, gp regression model, bayesian neural network, validation set long

Related articles: Most relevant | Search more

arXiv:2006.08453 [cs.LG] (Published 2020-06-04)

Bayesian Neural Network via Stochastic Gradient Descent

Abhinav Sagar

arXiv:2102.01391 [cs.LG] (Published 2021-02-02)

Bayesian Neural Networks for Virtual Flow Metering: An Empirical Study

Bjarne Grimstad, Mathilde Hotvedt, Anders T. Sandnes, Odd Kolbjørnsen, Lars S. Imsland

arXiv:2408.05496 [cs.LG] (Published 2024-08-10)

Variational Inference Failures Under Model Symmetries: Permutation Invariant Posteriors for Bayesian Neural Networks

Yoav Gelberg, Tycho F. A. van der Ouderaa, Mark van der Wilk, Yarin Gal

arXiv Analytics

arXiv:2310.17247 [cs.LG]Abstract References Reviews Resources

Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity

Links

Toolbox

arXiv:2310.17247 [cs.LG]AbstractReferencesReviewsResources

Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity

Links

Toolbox

arXiv:2310.17247 [cs.LG]Abstract References Reviews Resources