Theory of Probability and Mathematical Statistics
Graphical posterior predictive classification: Bayesian model averaging with particle Gibbs
Tatjana Pavlenko and Felix L. Rios
Link
Abstract: In this study, we present a multi-class graphical Bayesian predictive classifier that incorporates the uncertainty in the model selection into the standard Bayesian formalism. For each class, the dependence structure underlying the observed features is represented by a set of decomposable Gaussian graphical models. Emphasis is then placed on the Bayesian model averaging which takes full account of the class-specific model uncertainty by averaging over the posterior graph model probabilities. An explicit evaluation of the model probabilities is well known to be infeasible. To address this issue, we consider the particle Gibbs strategy of J. Olsson, T. Pavlenko, and F. L. Rios [Electron. J. Statist. 13 (2019), no. 2, 2865–2897] for posterior sampling from decomposable graphical models which utilizes the so-called Christmas tree algorithm of J. Olsson, T. Pavlenko, and F. L. Rios [Stat. Comput. 32 (2022), no. 5, Paper No. 80, 18] as proposal kernel. We also derive a strong hyper Markov law which we call the hyper normal Wishart law that allows to perform the resultant Bayesian calculations locally. The proposed predictive graphical classifier reveals superior performance compared to the ordinary Bayesian predictive rule that does not account for the model uncertainty, as well as to a number of out-of-the-box classifiers
Keywords: Decomposable graphical models, strong hyper Markov law, particle Markov chain Monte Carlo
Bibliography: Christophe Andrieu, Arnaud Doucet, and Roman Holenstein, Particle Markov chain Monte Carlo methods, J. R. Stat. Soc. Ser. B Stat. Methodol. 72 (2010), no. 3, 269–342. MR 2758115, DOI 10.1111/j.1467-9868.2009.00736.x
Jose-M. Bernardo and Adrian F. M. Smith, Bayesian theory, Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics, John Wiley & Sons, Ltd., Chichester, 1994. MR 1274699, DOI 10.1002/9780470316870
Simon Byrne and A. Philip Dawid, Structural Markov graph laws for Bayesian model uncertainty, Ann. Statist. 43 (2015), no. 4, 1647–1681. MR 3357874, DOI 10.1214/15-AOS1319
Nicolas Chopin and Sumeetpal S. Singh, On particle Gibbs sampling, Bernoulli 21 (2015), no. 3, 1855–1883. MR 3352064, DOI 10.3150/14-BEJ629
Merlise Clyde and Edward I. George, Model uncertainty, Statist. Sci. 19 (2004), no. 1, 81–94. MR 2082148, DOI 10.1214/088342304000000035
Jukka Corander, Yaqiong Cui, and Timo Koski, Inductive inference and partition exchangeability in classification, Algorithmic probability and friends, Lecture Notes in Comput. Sci., vol. 7070, Springer, Heidelberg, 2013, pp. 91–105. MR 3128216, DOI 10.1007/978-3-642-44958-1_{7}
Jukka Corander, Yaqiong Cui, Timo Koski, and Jukka Sirén, Have I seen you before? Principles of Bayesian predictive classification revisited, Stat. Comput. 23 (2013), no. 1, 59–73. MR 3018350, DOI 10.1007/s11222-011-9291-7
J. Corander, T. Koski, T. Pavlenko, and A. Tillander, Bayesian block-diagonal predictive classifier for Gaussian data, Synergies of Soft Computing and Statistics for Intelligent Data Analysis, Springer, Berlin, Heidelberg, 2013, pp. 543–551.
Yaqiong Cui, Jukka Sirén, Timo Koski, and Jukka Corander, Simultaneous predictive Gaussian classifiers, J. Classification 33 (2016), no. 1, 73–102. MR 3503204, DOI 10.1007/s00357-016-9197-3
A. P. Dawid and B. Q. Fang, Conjugate Bayes discrimination with infinitely many variables, J. Multivariate Anal. 41 (1992), no. 1, 27–42. MR 1156679, DOI 10.1016/0047-259X(92)90055-K
A. P. Dawid and S. L. Lauritzen, Hyper-Markov laws in the statistical analysis of decomposable graphical models, Ann. Statist. 21 (1993), no. 3, 1272–1317. MR 1241267, DOI 10.1214/aos/1176349260
Seymour Geisser, Posterior odds for multivariate normal classifications, J. Roy. Statist. Soc. Ser. B 26 (1964), 69–76. MR 174133, DOI 10.1111/j.2517-6161.1964.tb00540.x
Seymour Geisser, Predictive discrimination, Multivariate Analysis (Proc. Internat. Sympos., Dayton, Ohio, 1965) Academic Press, New York-London, 1966, pp. 149–163. MR 211539
Seymour Geisser, Predictive inference, Monographs on Statistics and Applied Probability, vol. 55, Chapman and Hall, New York, 1993. An introduction. MR 1252174, DOI 10.1007/978-1-4899-4467-2
Peter J. Green and Alun Thomas, Sampling decomposable graphs using a Markov chain on junction trees, Biometrika 100 (2013), no. 1, 91–110. MR 3034326, DOI 10.1093/biomet/ass052
Peter J. Green and Alun Thomas, A structural Markov property for decomposable graph laws that allows control of clique intersections, Biometrika 105 (2018), no. 1, 19–29. MR 3768862, DOI 10.1093/biomet/asx072
Robert E. Kass and Adrian E. Raftery, Bayes factors, J. Amer. Statist. Assoc. 90 (1995), no. 430, 773–795. MR 3363402, DOI 10.1080/01621459.1995.10476572
Steffen L. Lauritzen, Graphical models, Oxford Statistical Science Series, vol. 17, The Clarendon Press, Oxford University Press, New York, 1996. Oxford Science Publications. MR 1419991
D. Madigan and A. E. Raftery, Model selection and accounting for model uncertainty in graphical models using Occam’s window, Journal of the American Statistical Association 89 (1994), no. 428, 1535–1546.
D. Madigan, J. York, and D. Allard, Bayesian graphical models for discrete data, International Statistical Review / Revue Internationale de Statistique 63 (1995), no. 2, 215–232.
Henrik Nyman, Jie Xiong, Johan Pensar, and Jukka Corander, Marginal and simultaneous predictive classification using stratified graphical models, Adv. Data Anal. Classif. 10 (2016), no. 3, 305–326. MR 3541238, DOI 10.1007/s11634-015-0199-5
Jimmy Olsson, Tatjana Pavlenko, and Felix L. Rios, Bayesian learning of weakly structural Markov graph laws using sequential Monte Carlo methods, Electron. J. Stat. 13 (2019), no. 2, 2865–2897. MR 3998930, DOI 10.1214/19-EJS1585
Jimmy Olsson, Tatjana Pavlenko, and Felix L. Rios, Sequential sampling of junction trees for decomposable graphs, Stat. Comput. 32 (2022), no. 5, Paper No. 80, 18. MR 4487691, DOI 10.1007/s11222-022-10113-2
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort et al., Scikit-learn: machine learning in Python, J. Mach. Learn. Res. 12 (2011), 2825–2830. MR 2854348
A. Reiss and D. Stricker, Creating and benchmarking a new dataset for physical activity monitoring, Proceedings of the 5th International Conference on Pervasive Technologies Related to Assistive Environments, ACM, 2012, p. 40.
Amir M. Ben-Amram, Introducing: reasonable complete programming languages, Bull. Eur. Assoc. Theor. Comput. Sci. EATCS 64 (1998), 153–155. MR 1618301
F. L. Rios, G. Moffa, and J. Kuipers, Benchpress: a scalable and versatile workflow for benchmarking structure learning algorithms for graphical models, arXiv preprint arXiv:2107.03863 (2021).
B. D. Ripley, Pattern recognition and neural networks, Cambridge University Press, Cambridge, 2007. Reprint of the 1996 original. MR 2451352
Alun Thomas and Peter J. Green, Enumerating the junction trees of a decomposable graph, J. Comput. Graph. Statist. 18 (2009), no. 4, 930–940. MR 2598034, DOI 10.1198/jcgs.2009.07129
Nicholas C. Wormald, Counting labelled chordal graphs, Graphs Combin. 1 (1985), no. 2, 193–200. MR 951781, DOI 10.1007/BF02582944