arXiv Analytics

Sign in

arXiv:2310.17247 [cs.LG]AbstractReferencesReviewsResources

Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity

Jack Miller, Charles O'Neill, Thang Bui

Published 2023-10-26Version 1

In some settings neural networks exhibit a phenomenon known as grokking, where they achieve perfect or near-perfect accuracy on the validation set long after the same performance has been achieved on the training set. In this paper, we discover that grokking is not limited to neural networks but occurs in other settings such as Gaussian process (GP) classification, GP regression and linear regression. We also uncover a mechanism by which to induce grokking on algorithmic datasets via the addition of dimensions containing spurious information. The presence of the phenomenon in non-neural architectures provides evidence that grokking is not specific to SGD or weight norm regularisation. Instead, grokking may be possible in any setting where solution search is guided by complexity and error. Based on this insight and further trends we see in the training trajectories of a Bayesian neural network (BNN) and GP regression model, we make progress towards a more general theory of grokking. Specifically, we hypothesise that the phenomenon is governed by the accessibility of certain regions in the error and complexity landscapes.

Related articles: Most relevant | Search more
arXiv:2006.08453 [cs.LG] (Published 2020-06-04)
Bayesian Neural Network via Stochastic Gradient Descent
arXiv:2102.01391 [cs.LG] (Published 2021-02-02)
Bayesian Neural Networks for Virtual Flow Metering: An Empirical Study
arXiv:2408.05496 [cs.LG] (Published 2024-08-10)
Variational Inference Failures Under Model Symmetries: Permutation Invariant Posteriors for Bayesian Neural Networks