arXiv:2002.05715 Abstract | arXiv Analytics

arXiv:2002.05715 [cs.LG]Abstract References Reviews Resources

Self-Distillation Amplifies Regularization in Hilbert Space

Hossein Mobahi, Mehrdad Farajtabar, Peter L. Bartlett

Published 2020-02-13Version 1

Knowledge distillation introduced in the deep learning context is a method to transfer knowledge from one architecture to another. In particular, when the architectures are identical, this is called self-distillation. The idea is to feed in predictions of the trained model as new target values for retraining (and iterate this loop possibly a few times). It has been empirically observed that the self-distilled model often achieves higher accuracy on held out data. Why this happens, however, has been a mystery: the self-distillation dynamics does not receive any new information about the task and solely evolves by looping over training. To the best of our knowledge, there is no rigorous understanding of why this happens. This work provides the first theoretical analysis of self-distillation. We focus on fitting a nonlinear function to training data, where the model space is Hilbert space and fitting is subject to L2 regularization in this function space. We show that self-distillation iterations modify regularization by progressively limiting the number of basis functions that can be used to represent the solution. This implies (as we also verify empirically) that while a few rounds of self-distillation may reduce over-fitting, further rounds may lead to under-fitting and thus worse performance.

Categories: cs.LG, stat.ML

Keywords: self-distillation amplifies regularization, hilbert space, self-distillation iterations modify regularization, achieves higher accuracy, nonlinear function

Related articles: Most relevant | Search more

arXiv:2307.07539 [cs.LG] (Published 2023-07-14)

Improved Self-Normalized Concentration in Hilbert Spaces: Sublinear Regret for GP-UCB

Justin Whitehouse, Zhiwei Steven Wu, Aaditya Ramdas

arXiv:1905.05604 [cs.LG] (Published 2019-05-11)

Embeddings of Persistence Diagrams into Hilbert Spaces

Peter Bubenik, Alexander Wagner

arXiv:1809.07347 [cs.LG] (Published 2018-09-19)

A Generalized Representer Theorem for Hilbert Space - Valued Functions

Sanket Diwale, Colin Jones