arXiv Analytics

Sign in

arXiv:2305.17297 [cs.LG]AbstractReferencesReviewsResources

Generalization Error without Independence: Denoising, Linear Regression, and Transfer Learning

Chinmaya Kausik, Kashvi Srivastava, Rishi Sonthalia

Published 2023-05-26Version 1

Studying the generalization abilities of linear models with real data is a central question in statistical learning. While there exist a limited number of prior important works (Loureiro et al. (2021A, 2021B), Wei et al. 2022) that do validate theoretical work with real data, these works have limitations due to technical assumptions. These assumptions include having a well-conditioned covariance matrix and having independent and identically distributed data. These assumptions are not necessarily valid for real data. Additionally, prior works that do address distributional shifts usually make technical assumptions on the joint distribution of the train and test data (Tripuraneni et al. 2021, Wu and Xu 2020), and do not test on real data. In an attempt to address these issues and better model real data, we look at data that is not I.I.D. but has a low-rank structure. Further, we address distributional shift by decoupling assumptions on the training and test distribution. We provide analytical formulas for the generalization error of the denoising problem that are asymptotically exact. These are used to derive theoretical results for linear regression, data augmentation, principal component regression, and transfer learning. We validate all of our theoretical results on real data and have a low relative mean squared error of around 1% between the empirical risk and our estimated risk.

Related articles: Most relevant | Search more
arXiv:2006.07002 [cs.LG] (Published 2020-06-12)
Double Double Descent: On Generalization Errors in Transfer Learning between Linear Regression Tasks
arXiv:1705.07048 [cs.LG] (Published 2017-05-19)
Linear regression without correspondence
arXiv:1206.3274 [cs.LG] (Published 2012-06-13)
Small Sample Inference for Generalization Error in Classification Using the CUD Bound