Theory of Probability and Mathematical Statistics
Discriminant analysis in small and large dimensions
T. Bodnar, S. Mazur, E. Ngailo, N. Parolya
Download PDF
Abstract: We study the distributional properties of the linear discriminant function under the assumption of normality by comparing two groups with the same covariance matrix but different mean vectors. A stochastic representation for the discriminant function coefficients is derived, which is then used to obtain their asymptotic distribution under the high-dimensional asymptotic regime. We investigate the performance of the classification analysis based on the discriminant function in both small and large dimensions. A stochastic representation is established, which allows to compute the error rate in an efficient way. We further compare the calculated error rate with the optimal one obtained under the assumption that the covariance matrix and the two mean vectors are known. Finally, we present an analytical expression of the error rate calculated in the high-dimensional asymptotic regime. The finite-sample properties of the derived theoretical results are assessed via an extensive Monte Carlo study.
Keywords: Discriminant function, stochastic representation, large-dimensional asymptotics, random matrix theory, classification analysis.
Bibliography: 1. A. Agarwal, S. Negahban, M. J. Wainwright, Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions, Annals of Statistics, 40 (2012), no. 2, 1171–1197.
2. Z. Bai, D. Jiang, J.-F. Yao, S. Zheng, Corrections to lrt on large-dimensional covariance matrix by rmt, Annals of Statistics, 37 (2009), no. 6B, 3822–3840.
3. Z. Bai, J. W. Silverstein, Spectral analysis of large dimensional random matrices, New York, NY: Springer Science+ Business Media, LLC, 2010.
4. D. Bauder, R. Bodnar, T. Bodnar, W. Schmid, Bayesian estimation of the efficient frontier, Scandinavian Journal of Statistics, to appear (2019).
5. D. Bauder, T. Bodnar, S. Mazur, Y. Okhrin, Bayesian inference for the tangent portfolio, International Journal of Theoretical and Applied Finance, 21 (2018), no. 8.
6. P. J. Bickel, E. Levina, Some theory for Fisher’s linear discriminant function,’naive bayes’, and some alternatives when there are many more variables than observations, Bernoulli (2004), 989–1010.
7. T. Bodnar, H. Dette, N. Parolya, Testing for independence of large dimensional vectors, The Annals of Statistics, to appear (2019).
8. T. Bodnar, A. Gupta, N. Parolya, Direct shrinkage estimation of large dimensional precision matrix, Journal of Multivariate Analysis, 146 (2016), 223–236.
9. T. Bodnar, A. K. Gupta, N. Parolya, On the strong convergence of the optimal linear shrink-age estimator for large dimensional covariance matrix, Journal of Multivariate Analysis, 132 (2014), 215–228.
10. T. Bodnar, S. Mazur, Y. Okhrin, Bayesian estimation of the global minimum variance portfolio, European Journal of Operational Research, 256 (2017), 292–307.
11. T. Bodnar, S. Mazur, N. Parolya, Central limit theorems for functionals of large sample covariance matrix and mean vector in matrix-variate location mixture of normal distributions, Scandinavian Journal of Statistics, 46 (2019), 636–660.
12. T. Bodnar, Y. Okhrin, Properties of the singular, inverse and generalized inverse partitioned wishart distributions, Journal of Multivariate Analysis, 99 (2008), 2389–2405.
13. T. Bodnar, Y. Okhrin, On the product of inverse wishart and normal distributions with applications to discriminant analysis and portfolio theory, Scandinavian Journal of Statistics, 38 (2011), no. 2, 311–331.
14. T. Bodnar, M. Reiß, Exact and asymptotic tests on a factor model in low and large dimensions with applications, Journal of Multivariate Analysis, 150 (2016), 125–151.
15. T. Bodnar, W. Schmid, A test for the weights of the global minimum variance portfolio in an elliptical model, Metrika, 67 (2008), no. 2, 127–143.
16. T. Cai, W. Liu, Adaptive thresholding for sparse covariance matrix estimation, Journal of the American Statistical Association, 106 (2011), no. 494, 672–684.
17. T. Cai, W. Liu, A direct estimation approach to sparse linear discriminant analysis, Journal of the American Statistical Association, 106 (2011), no. 496, 1566–1577.
18. T. Cai, W. Liu, X. Luo, A constrained l 1 minimization approach to sparse precision matrix estimation, Journal of the American Statistical Association, 106 (2011), no. 494, 594–607.
19. T. Cai, T. Jiang, Limiting laws of coherence of random matrices with applications to testing covariance structure and construction of compressed sensing matrices, Annals of Statistics, 39(2011), no. 3, 1496–1525.
20. S. X. Chen, L.-X. Zhang, P.-S. Zhong, Tests for high-dimensional covariance matrices, Journal of the American Statistical Association, 105 (2010), no. 490, 810–819.
21. A. DasGupta, Asymptotic theory of statistics and probability, Springer Science & Business Media, 2008.
22. J. Fan, Y. Fan, J. Lv, High dimensional covariance matrix estimation using a factor model, Journal of Econometrics, 147 (2008), no. 1, 186–197.
23. J. Fan, Y. Liao, M. Mincheva, Large covariance estimation by thresholding principal orthogonal complements, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(2013), no. 4, 603–680.
24. Y. Fujikoshi, T. Seo, Asymptotic aproximations for epmc’s of the linear and the quadratic discriminant functions when the sample sizes and the dimension are large, Random operators and stochastic equations, University of Toronto, 1997.
25. G. H. Givens, J. A. Hoeting, Computational statistics, John Wiley & Sons, 2012.
26. A. Gupta, D. Nagar, Matrix Variate Distributions, Chapman and Hall/CRC, Boca Raton, 2000.
27. A. Gupta, T. Varga, T. Bodnar, Elliptically contoured models in statistics and portfolio theory, second ed., Springer, 2013.
28. A. Gupta, T. Bodnar, An exact test about the covariance matrix, Journal of Multivariate Analysis, 125 (2014), 176–189.
29. T. Jiang, F. Yang, Central limit theorems for classical likelihood ratio tests for high-dimensional normal distributions, Annals of Statistics, 41 (2013), no. 4, 2029–2074.
30. R. A. Johnson, D. W. Wichern et al., Applied multivariate statistical analysis, Prentice hall Upper Saddle River, NJ, 2007.
31. I. M. Johnstone, On the distribution of the largest eigenvalue in principal components analysis, Annals of Statistics, 29 (2001), no. 2, 295–327.
32. O. Ledoit, M. Wolf, Improved estimation of the covariance matrix of stock returns with an application to portfolio selection, Journal of Empirical Finance, 10 (2003), no. 5, 603–621.
33. A. Mathai, S. B. Provost, Quadratic forms in random variables, Marcel Dekker, 1992.
34. R. J. Muirhead, Aspects of multivariate statistical theory, Wiley, New York, 1982.
35. I. Narsky, F. C. Porter, Linear and quadratic discriminant analysis, logistic regression, and partial least squares regression, Statistical Analysis Techniques in Particle Physics: Fits, Density Estimation and Supervised Learning (2013), 221–249.
36. A. C. Rencher, Multivariate statistical inference and applications, vol. 338, Wiley-Interscience, 1998.
37. A. C. Rencher, W. F. Christensen, Methods of multivariate analysis, John Wiley & Sons, 2012.
38. J. Shao, Y. Wang, X. Deng, S. Wang et al., Sparse linear discriminant analysis by thresholding for high dimensional data, The Annals of statistics, 39 (2011), no. 2, 1241–1265.
39. M. S. Srivastava, T. Kubokawa, Comparison of discrimination methods for high dimensional data, Journal of the Japan Statistical Society, 37 (2007), 123–134.
40. M. Tamatani, Asymptotic theory for discriminant analysis in high dimension low sample size, Memoirs of the Graduate School of Science and Engineering, Shimane University. Series B, Mathematics (2015), 15–26.
41. F. J. Wyman, D. M. Young, D. W. Turner, A comparison of asymptotic error rate expansions for the sample linear discriminant function, Pattern Recognition, 23 (1990), 775–783.