arXiv:1710.09553 Abstract | arXiv Analytics

arXiv:1710.09553 [cs.LG]Abstract References Reviews Resources

Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior

Published 2017-10-26Version 1

We describe an approach to understand the peculiar and counterintuitive generalization properties of deep neural networks. The approach involves going beyond worst-case theoretical capacity control frameworks that have been popular in machine learning in recent years to revisit old ideas in the statistical mechanics of neural networks. Within this approach, we present a prototypical Very Simple Deep Learning (VSDL) model, whose behavior is controlled by two control parameters, one describing an effective amount of data, or load, on the network (that decreases when noise is added to the input), and one with an effective temperature interpretation (that increases when algorithms are early stopped). Using this model, we describe how a very simple application of ideas from the statistical mechanics theory of generalization provides a strong qualitative description of recently-observed empirical results regarding the inability of deep neural networks not to overfit training data, discontinuous learning and sharp transitions in the generalization properties of learning algorithms, etc.

Comments: 28 pages

Categories: cs.LG, stat.ML

Keywords: statistical mechanics approaches, complex learning behavior, revisiting old ideas, theoretical capacity control frameworks, rethinking generalization

Related articles:

arXiv:2006.07054 [cs.LG] (Published 2020-06-12)

Learning TSP Requires Rethinking Generalization

Chaitanya K. Joshi, Quentin Cappart, Louis-Martin Rousseau, Thomas Laurent, Xavier Bresson

arXiv:1611.03530 [cs.LG] (Published 2016-11-10)

Understanding deep learning requires rethinking generalization

Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals

arXiv:2310.03957 [cs.LG] (Published 2023-10-06)

Understanding prompt engineering may not require rethinking generalization