Contents

Preface

1 Introduction to Bayesian Statistics

1.1 THE FREQUENTIST APPROACH TO STATISTICS

1.2 THE BAYESIAN APPROACH TO STATISTICS

1.3 COMPARING LIKELIHOOD AND BAYESIAN APPROACHES TO STATISTICS

1.4 COMPUTATIONAL BAYESIAN STATISTICS

1.5 PURPOSE AND ORGANIZATION OF THIS BOOK

2 Monte Carlo Sampling from the Posterior

2.1 ACCEPTANCE-REJECTION-SAMPLING

2.2 SAMPLING-IMPORTANCE-RESAMPLING

2.3 ADAPTIVE-REJECTION-SAMPLING FROM A LOG-CONCAVE DISTRIBUTION

2.4 WHY DIRECT METHODS ARE INEFFICIENT FOR HIGH-DIMENSION PARAMETER SPACE

3 Bayesian Inference

3.1 BAYESIAN INFERENCE FROM THE NUMERICAL POSTERIOR

3.2 BAYESIAN INFERENCE FROM POSTERIOR RANDOM SAMPLE

4 Bayesian Statistics Using Conjugate Priors

4.1 ONE-DIMENSIONAL EXPONENTIAL FAMILY OF DENSITIES

4.2 DISTRIBUTIONS FOR COUNT DATA

4.3 DISTRIBUTIONS FOR WAITING TIMES

4.4 NORMALLY DISTRIBUTED OBSERVATIONS WITH KNOWN VARIANCE

4.5 NORMALLY DISTRIBUTED OBSERVATIONS WITH KNOWN MEAN

4.6 NORMALLY DISTRIBUTED OBSERVATIONS WITH UNKNOWN MEAN AND VARIANCE

4.7 MULTIVARIATE NORMAL OBSERVATIONS WITH KNOWN COVARIANCE MATRIX

4.8 OBSERVATIONS FROM NORMAL LINEAR REGRESSION MODEL

Appendix: Proof of Poisson Process Theorem

5 Markov Chains

5.1 STOCHASTIC PROCESSES

5.2 MARKOV CHAINS

5.3 TIME-INVARIANT MARKOV CHAINS WITH FINITE STATE SPACE

5.4 CLASSIFICATION OF STATES OF A MARKOV CHAIN

5.5 SAMPLING FROM A MARKOV CHAIN

5.6 TIME-REVERSIBLE MARKOV CHAINS AND DETAILED BALANCE

5.7 MARKOV CHAINS WITH CONTINUOUS STATE SPACE

6 Markov Chain Monte Carlo Sampling from Posterior

6.1 METROPOLIS-HASTINGS ALGORITHM FOR A SINGLE PARAMETER

6.2 METROPOLIS-HASTINGS ALGORITHM FOR MULTIPLE PARAMETERS

6.3 BLOCKWISE METROPOLIS-HASTINGS ALGORITHM

6.4 GIBBS SAMPLING

6.5 SUMMARY

7 Statistical Inference from a Markov Chain Monte Carlo Sample

7.1 MIXING PROPERTIES OF THE CHAIN

7.2 FINDING A HEAVY-TAILED MATCHED CURVATURE CANDIDATE DENSITY

7.3 OBTAINING AN APPROXIMATE RANDOM SAMPLE FOR INFERENCE

Appendix: Procedure for Finding the Matched Curvature Candidate Density for a Multivariate Parameter

8 Logistic Regression

8.1 LOGISTIC REGRESSION MODEL

8.2 COMPUTATIONAL BAYESIAN APPROACH TO THE LOGISTIC REGRESSION MODEL

8.3 MODELLING WITH THE MULTIPLE LOGISTIC REGRESSION MODEL

9 Poisson Regression and Proportional Hazards Model

9.1 POISSON REGRESSION MODEL

9.2 COMPUTATIONAL APPROACH TO POISSON REGRESSION MODEL

9.3 THE PROPORTIONAL HAZARDS MODEL

9.4 COMPUTATIONAL BAYESIAN APPROACH TO PROPORTIONAL HAZARDS MODEL

10 GIBBS SAMPLING AND HIERARCHICAL MODELS

10.1 GIBBS SAMPLING PROCEDURE

10.2 THE GIBBS SAMPLER FOR THE NORMAL DISTRIBUTION

10.3 HIERARCHICAL MODELS AND GIBBS SAMPLING

10.4 MODELLING RELATED POPULATIONS WITH HIERARCHICAL MODELS

Appendix: Proof That Improper Jeffrey's Prior Distribution for the Hypervariance Can Lead to an Improper Posterior

11 Going Forward with Markov Chain Monte Carlo

A Using the Included Minitab Macros

B Using the Included R Functions

References

WILEY SERIES IN COMPUTATIONAL STATISTICS

Index

WILEY SERIES IN COMPUTATIONAL STATISTICS

Consulting Editors:

Paolo Giudici
University ofPavia, Italy

Geof H. Givens

Bani K. Mallick
Texas A&M University, USA

Wiley Series in Computational Statistics is comprised of practical guides and cutting edge research books on new developments in computational statistics. It features quality authors with a strong applications focus. The texts in the series provide detailed coverage of statistical concepts, methods and case studies in areas at the interface of statistics, computing, and numerics.

With sound motivation and a wealth of practical examples, the books show in concrete terms how to select and to use appropriate ranges of statistical computing techniques in particular fields of study. Readers are assumed to have a basic understanding of introductory terminology.

The series concentrates on applications of computational methods in statistics to fields of bioinformatics, genomics, epidemiology, business, engineering, finance and applied statistics.

A complete list of titles in this series appears at the end of the volume.

This book is dedicated to

Sylvie,

Ben, Rachel,

Mary, and Elizabeth

Preface

In theory, Bayesian statistics is very simple. The posterior is proportional to the prior times likelihood. This gives the shape of the posterior, but it is not a density so it cannot be used for inference. The exact scale factor needed to make this a density can be found only in a few special cases. For other cases, the scale factor requires a numerical integration, which may be difficult when there are multiple parameters. So in practice, Bayesian statistics is more difficult, and this has held back its use for applied statistical problems.

Computational Bayesian statistics has changed all this. It is based on the big idea that statistical inferences can be based on a random sample drawn from the posterior. The algorithms that are used allow us to draw samples from the exact posterior even when we only know its shape and we do not know the scale factor needed to make it an exact density. These algorithms include direct methods where a random sample drawn from an easily sampled distribution is reshaped by only accepting some candidate values into the final sample. More sophisticated algorithms are based on setting up a Markov chain that has the posterior as its long-run distribution. When the chain is allowed to run a sufficiently long time, a draw from the chain can be considered a random draw from the target (posterior) distribution. These algorithms are particularly well suited for complicated models with many parameters. This is revolutionizing applied statistics. Now applied statistics based on these computational Bayesian methods can be easily accomplished in practice.

Features of the text

This text grew out of a course I developed at Waikato University. My goal for that course and this text is to bring these exciting developments to upper-level undergraduate and first-year graduate students of statistics. This text introduces this big idea to students in such a way that they can develop a strategy for making statistical inferences in this way. This requires an understanding of the pitfalls that can arise when using this approach, what can be done to avoid them, and how to recognize them if they are occurring. The practitioner has many choices to make in using this approach. Poor choices will lead to incorrect inferences. Sensible choices will lead to satisfactory inferences in an efficient manner.

This text follows a step-by-step development. In Chapter 1 we learn about the similarities and differences between the Bayesian and the likelihood approaches to statistics. This is important because when a flat prior is used, the posterior has the same shape as the likelihood function, yet they have different methods for inferences. The Bayesian approach allows us to interpret the posterior as a probability density and it is this interpretation that leads to the advantages of this approach. In Chapter 2 we examine direct approaches to drawing a random sample from the posterior even when we only know its shape by reshaping a random sample drawn from another easily sampled density by only accepting some of the candidates into the final sample. These methods are satisfactory for models with only a few parameters provided the candidate density has heavier tails than the target. For models with many parameters direct methods become very inefficient. In these models, direct methods still may have a role as a small step in a larger Markov chain Monte carlo algorithm. In Chapter 3 we show how statistical inferences can be made from a random sample from the posterior in a completely analogous way to the corresponding inferences taken from a numerically calculated posterior. In Chapter 4 we study the distributions from the one-dimensional exponential family. When the observations come from a member of this family, and the prior is from the conjugate family, then the posterior will be another member of the conjugate family. It can easily be found by simple updating rules. We also look at the normal distribution with unknown mean and variance, which is a member of two-dimensional exponential family, and the multivariate normal and normal regression models. These exponential family cases are the only cases where the formula for the posterior can be found analytically. Before the development of computing, Bayesian statistics could only be done in practice in these few cases. We will use these as steps in a larger model. In Chapter 5 we introduce Markov chains. An understanding of Markov chains and their long-run behavior is needed before we study the more advanced algorithms in the book. Things that can happen in a Markov chain can also happen in a Markov chain Monte Carlo (MCMC) model. This chapter finishes with the Metropolis algorithm. This algorithm allows us take a Markov chain and find a new Markov chain from it that will have the target (posterior) as its long-run distribution. In Chapter 6 we introduce the Metropolis-Hastings algorithm and show that how it performs depends on whether we use a random-walk or independent candidate density. We show how, in a multivariate case, we can either draw all the parameters at once, or blockwise, and that the Gibbs sampler is a special case of blockwise Metropolis-Hastings. In Chapter 7 we investigate how the mixing properties of the chain depend on the choice of the candidate density. We show how to find a heavy-tailed candidate density starting from the maximum likelihood estimator and matched curvature covariance matrix. We show that this will lead to a very efficient MCMC process. We investigate several methods for deciding on burn-in time and thinning required to get an approximately random sample from the posterior density from the MCMC output as the basis for inference. In Chapter 8 we apply this to the logistic regression model. This is a generalized linear model, and we find the maximum likelihood estimator and matched curvature covariance matrix using iteratively reweighted least squares. In the cases where we have a normal prior, we can find the approximate normal posterior by the simple updating rules we studied in Chapter 4. We use the Student's t equivalent as the heavy-tailed independent candidate density for the Metropolis-Hastings algorithm. After burn-in, a draw from the Markov chain will be random draw from the exact posterior, not the normal approximation. We discuss how to determine priors for this model. We also investigate strategies to remove variables from the model to get a better prediction model. In Chapter 9, we apply these same ideas to the Poisson regression model. The Proportional hazards model turns out to have the same likelihood as a Poisson, so these ideas apply here as well. In Chapter 10 we investigate the Gibbs sampling algorithm. We demonstrate it on the normal(μ, σ2) model where both parameters are unknown for both the independent prior case and the joint conjugate prior case. We see the Gibbs sampler is particularly well suited when we have a hierarchical model. In that case, we can draw a directed acyclic graph showing the dependency structure of the parameters. The conditional distribution of each block of parameters given all other blocks has a particularly easy form. In Chapter 11, we discus methods for speeding up convergence in Gibbs sampling. We also direct the reader to more advanced topics that are beyond the scope of the text.

Software

I have developed Minitab macros that perform the computational methods shown in the text. My colleague, Dr. James Curran has written corresponding R-Functions. These may be downloaded from the following website.

Acknowledgments

I would like to acknowledge the help I have had from many people. First, my students over the past three years, whose enthusiasm with the early drafts encouraged me to continue writing. My colleague, Dr. James Curran, who wrote the R-functions and wrote Appendix B on how to implement them, has made a major contribution to this book. I want to thank Dr. Gerry Devlin, the Clinical director of Cardiology at the Waikato Hospital for letting me use the data from the Waikato District Health Board Cardiac Survival Study, Dr. Neil Swanson and Gaelle Dutu for discussing this dataset with me, and my student Yihong Zhang, who assisted me on this study. I also want to thank Dr. David Fergusson from the Dept. of Psychological Medicine at the Christchurch School of Medicine and Health Sciences for letting me use the circumcision data from the longitudinal study of a birth cohort. I want to thank my colleagues at the University of Waikato, Dr Murray Jorgensen, Dr. Judi McWhirter, Dr. Lyn Hunt, Dr. Kevin Broughan, and my former student Dr. Jason Catchpole, who all proofread parts of the manuscript. I appreciated their helpful comments, and any errors that remain are solely my responsibility. I would like to thank Cathy Akritas at Minitab for her help in improving my Minitab macros. I would like to thank Steve Quigley, Jackie Palmieri, Melissa Yanuzzi, and the team at John Wiley & Sons for their support, as well as Amy Hendrickson of TjnXnology, Inc. for help with LaTex.

Finally, last but not least, I wish to thank my wife Sylvie for her constant love and support.