Theory of Probability and Mathematical Statistics
A note on the prediction error of principal component regression in high dimensions
Laura Hucker and Martin Wahl
Link
Abstract: We analyze the prediction error of principal component regression (PCR) and prove high probability bounds for the corresponding squared risk conditional on the design. Our first main result shows that PCR performs comparably to the oracle method obtained by replacing empirical principal components by their population counterparts, provided that an effective rank condition holds. On the other hand, if the latter condition is violated, then empirical eigenvalues start to have a significant upward bias, resulting in a self-induced regularization of PCR. Our approach relies on the behavior of empirical eigenvalues, empirical eigenvectors and the excess risk of principal component analysis in high-dimensional regimes.
Keywords: Principal component regression, prediction error, principal component analysis, excess risk, eigenvalue upward bias, benign overfitting
Bibliography: Peter L. Bartlett, Philip M. Long, Gábor Lugosi, and Alexander Tsigler, Benign overfitting in linear regression, Proc. Natl. Acad. Sci. USA 117 (2020), no. 48, 30063–30070. MR 4263288, DOI 10.1073/pnas.1907378117
Peter L. Bartlett, Andrea Montanari, and Alexander Rakhlin, Deep learning: a statistical viewpoint, Acta Numer. 30 (2021), 87–201. MR 4295218, DOI 10.1017/S0962492921000027
Florent Benaych-Georges and Raj Rao Nadakuditi, The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices, Adv. Math. 227 (2011), no. 1, 494–521. MR 2782201, DOI 10.1016/j.aim.2011.02.007
Gilles Blanchard and Nicole Mücke, Optimal rates for regularization of statistical inverse learning problems, Found. Comput. Math. 18 (2018), no. 4, 971–1013. MR 3833647, DOI 10.1007/s10208-017-9359-7
Alex Bloemendal, Antti Knowles, Horng-Tzer Yau, and Jun Yin, On the principal components of sample covariance matrices, Probab. Theory Related Fields 164 (2016), no. 1-2, 459–552. MR 3449395, DOI 10.1007/s00440-015-0616-x
Stéphane Boucheron, Gábor Lugosi, and Pascal Massart, Concentration inequalities, Oxford University Press, Oxford, 2013. A nonasymptotic theory of independence; With a foreword by Michel Ledoux. MR 3185193, DOI 10.1093/acprof:oso/9780199535255.001.0001
Élodie Brunel, André Mas, and Angelina Roche, Non-asymptotic adaptive prediction in functional linear models, J. Multivariate Anal. 143 (2016), 208–232. MR 3431429, DOI 10.1016/j.jmva.2015.09.008
Hervé Cardot and Jan Johannes, Thresholding projection estimators in functional linear models, J. Multivariate Anal. 101 (2010), no. 2, 395–408. MR 2564349, DOI 10.1016/j.jmva.2009.03.001
Alain Celisse and Martin Wahl, Analyzing the discrepancy principle for kernelized spectral filter learning algorithms, J. Mach. Learn. Res. 22 (2021), Paper No. 76, 59. MR 4253769
László Györfi, Michael Kohler, Adam Krzyżak, and Harro Walk, A distribution-free theory of nonparametric regression, Springer Series in Statistics, Springer-Verlag, New York, 2002. MR 1920390, DOI 10.1007/b97848
Peter Hall and Joel L. Horowitz, Methodology and convergence rates for functional linear regression, Ann. Statist. 35 (2007), no. 1, 70–91. MR 2332269, DOI 10.1214/009053606000000957
Lajos Horváth and Piotr Kokoszka, Inference for functional data with applications, Springer Series in Statistics, Springer, New York, 2012. MR 2920735, DOI 10.1007/978-1-4614-3655-3
Tailen Hsing and Randall Eubank, Theoretical foundations of functional data analysis, with an introduction to linear operators, Wiley Series in Probability and Statistics, John Wiley & Sons, Ltd., Chichester, 2015. MR 3379106, DOI 10.1002/9781118762547
Moritz Jirak and Martin Wahl, Relative perturbation bounds with applications to empirical covariance operators, Adv. Math. 412 (2023), Paper No. 108808, 59. MR 4517351, DOI 10.1016/j.aim.2022.108808
I. T. Jolliffe, Principal component analysis, 2nd ed., Springer Series in Statistics, Springer-Verlag, New York, 2002. MR 2036084
Vladimir Koltchinskii and Karim Lounici, Concentration inequalities and moment bounds for sample covariance operators, Bernoulli 23 (2017), no. 1, 110–133. MR 3556768, DOI 10.3150/15-BEJ730
Shuai Lu and Sergei V. Pereverzev, Regularization theory for ill-posed problems, Inverse and Ill-posed Problems Series, vol. 58, De Gruyter, Berlin, 2013. Selected topics. MR 3114700, DOI 10.1515/9783110286496
Song Mei, Theodor Misiakiewicz, and Andrea Montanari, Generalization error of random feature and kernel methods: hypercontractivity and kernel matrix concentration, Appl. Comput. Harmon. Anal. 59 (2022), 3–84. MR 4412180, DOI 10.1016/j.acha.2021.12.003
Boaz Nadler, Finite sample approximation results for principal component analysis: a matrix perturbation approach, Ann. Statist. 36 (2008), no. 6, 2791–2817. MR 2485013, DOI 10.1214/08-AOS618
Debashis Paul, Asymptotics of sample eigenstructure for a large dimensional spiked covariance model, Statist. Sinica 17 (2007), no. 4, 1617–1642. MR 2399865
Markus Reiss and Martin Wahl, Nonasymptotic upper bounds for the reconstruction error of PCA, Ann. Statist. 48 (2020), no. 2, 1098–1123. MR 4102689, DOI 10.1214/19-AOS1839
Alexander Tsigler and Peter L. Bartlett, Benign overfitting in ridge regression, J. Mach. Learn. Res. 24 (2023), Paper No. [123], 76. MR 4583284
Roman Vershynin, Introduction to the non-asymptotic analysis of random matrices, Compressed sensing, Cambridge Univ. Press, Cambridge, 2012, pp. 210–268. MR 2963170
Roman Vershynin, High-dimensional probability, Cambridge Series in Statistical and Probabilistic Mathematics, vol. 47, Cambridge University Press, Cambridge, 2018. An introduction with applications in data science; With a foreword by Sara van de Geer. MR 3837109, DOI 10.1017/9781108231596
Ernesto De Vito, Lorenzo Rosasco, Andrea Caponnetto, Umberto De Giovannini, and Francesca Odone, Learning from examples as an inverse problem, J. Mach. Learn. Res. 6 (2005), 883–904. MR 2249842