Image

CONTENTS

PREFACE

ACKNOWLEDGMENTS

ACRONYMS

CHAPTER 1: INTRODUCTION TO BAYESIAN INFERENCE

1.1 INTRODUCTION: BAYESIAN MODELING IN THE 21 ST CENTURY

1.2 DEFINITION OF STATISTICAL MODELS

1.3 BAYES THEOREM

1.4 MODEL-BASED BAYESIAN INFERENCE

1.5 INFERENCE USING CONJUGATE PRIOR DISTRIBUTIONS

1.6 NONCONJUGATE ANALYSIS

Problems

CHAPTER 2: MARKOV CHAIN MONTE CARLO ALGORITHMS IN BAYESIAN INFERENCE

2.1 SIMULATION, MONTE CARLO INTEGRATION, AND THEIR IMPLEMENTATION IN BAYESIAN INFERENCE

2.2 MARKOV CHAIN MONTE CARLO METHODS

2.3 POPULAR MCMC ALGORITHMS

2.4 SUMMARY AND CLOSING REMARKS

Problems

CHAPTER 3: WinBUGS SOFTWARE: INTRODUCTION, SETUP, AND BASIC ANALYSIS

3.1 INTRODUCTION AND HISTORICAL BACKGROUND

3.2 THE WinBUGS ENVIRONMENT

3.3 PRELIMINARIES ON USING WinBUGS

3.4 BUILDING BAYESIAN MODELS IN WinBUGS

3.5 COMPILING THE MODEL AND SIMULATING VALUES

3.6 BASIC OUTPUT ANALYSIS USING THE SAMPLE MONITOR TOOL

3.7 SUMMARIZING THE PROCEDURE

3.8 CHAPTER SUMMARY AND CONCLUDING COMMENTS

Problems

CHAPTER 4: WinBUGS SOFTWARE: ILLUSTRATION, RESULTS, AND FURTHER ANALYSIS

4.1 A COMPLETE EXAMPLE OF RUNNING MCMC IN WinBUGS FOR A SIMPLE MODEL

4.2 FURTHER OUTPUT ANALYSIS USING THE INFERENCE MENU

4.3 MULTIPLE CHAINS

4.4 CHANGING THE PROPERTIES OF A FIGURE

4.5 OTHER TOOLS AND MENUS

4.6 SUMMARY AND CONCLUDING REMARKS

Problems

CHAPTER 5: INTRODUCTION TO BAYESIAN MODELS: NORMAL MODELS

5.1 GENERAL MODELING PRINCIPLES

5.2 MODEL SPECIFICATION IN NORMAL REGRESSION MODELS

5.3 USING VECTORS AND MULTIVARIATE PRIORS IN NORMAL REGRESSION MODELS

5.4 ANALYSIS OF VARIANCE MODELS

Problems

CHAPTER 6: INCORPORATING CATEGORICAL VARIABLES IN NORMAL MODELS AND FURTHER MODELING ISSUES

6.1 ANALYSIS OF VARIANCE MODELS USING DUMMY VARIABLES

6.2 ANALYSIS OF COVARIANCE MODELS

6.3 A BIOASSAY EXAMPLE

6.4 FURTHER MODELING ISSUES

6.5 CLOSING REMARKS

Problems

CHAPTER 7: INTRODUCTION TO GENERALIZED LINEAR MODELS: BINOMIAL AND POISSON DATA

7.1 INTRODUCTION

7.2 PRIOR DISTRIBUTIONS

7.3 POSTERIOR INFERENCE

7.4 POISSON REGRESSION MODELS

7.5 BINOMIAL RESPONSE MODELS

7.6 MODELS FOR CONTINGENCY TABLES

Problems

CHAPTER 8: MODELS FOR POSITIVE CONTINUOUS DATA, COUNT DATA, AND OTHER GLM-BASED EXTENSIONS

8.1 MODELS WITH NONSTANDARD DISTRIBUTIONS

8.2 MODELS FOR POSITIVE CONTINUOUS RESPONSE VARIABLES

8.3 ADDITIONAL MODELS FOR COUNT DATA

8.4 FURTHER GLM-BASED MODELS AND EXTENSIONS

Problems

CHAPTER 9: BAYESIAN HIERARCHICAL MODELS

9.1 INTRODUCTION

9.2 SOME SIMPLE EXAMPLES

9.3 THE GENERALIZED LINEAR MIXED MODEL FORMULATION

9.4 DISCUSSION, CLOSING REMARKS, AND FURTHER READING

Problems

CHAPTER 10: THE PREDICTIVE DISTRIBUTION AND MODEL CHECKING

10.1 INTRODUCTION

10.2 ESTIMATING THE PREDICTIVE DISTRIBUTION FOR FUTURE OR MISSING OBSERVATIONS USING MCMC

10.3 USING THE PREDICTIVE DISTRIBUTION FOR MODEL CHECKING

10.4 USING CROSS-VALIDATION PREDICTIVE DENSITIES FOR MODEL CHECKING, EVALUATION, AND COMPARISON

10.5 ILLUSTRATION OF A COMPLETE PREDICTIVE ANALYSIS: NORMAL REGRESSION MODELS

10.6 DISCUSSION

Problems

CHAPTER 11: BAYESIAN MODEL AND VARIABLE EVALUATION

11.1 PRIOR PREDICTIVE DISTRIBUTIONS AS MEASURES OF MODEL COMPARISON: POSTERIOR MODEL ODDS AND BAYES FACTORS

11.2 SENSITIVITY OF THE POSTERIOR MODEL PROBABILITIES: THE LINDLEY-BARTLETT PARADOX

11.3 COMPUTATION OF THE MARGINAL LIKELIHOOD

11.4 COMPUTATION OF THE MARGINAL LIKELIHOOD USING WinBUGS

11.5 BAYESIAN VARIABLE SELECTION USING GIBBS-BASED METHODS

11.6 POSTERIOR INFERENCE USING THE OUTPUT OF BAYESIAN VARIABLE SELECTION SAMPLERS

11.7 IMPLEMENTATION OF GIBBS VARIABLE SELECTION IN WinBUGS USING AN ILLUSTRATIVE EXAMPLE

11.8 THE CARLIN–CHIB METHOD

11.9 REVERSIBLE JUMP MCMC (RJMCMC)

11.10 USING POSTERIOR PREDICTIVE DENSITIES FOR MODEL EVALUATION

11.11 INFORMATION CRITERIA

11.12 DISCUSSION AND FURTHER READING

Problems

APPENDIX A: MODEL SPECIFICATION VIA DIRECTED ACYCLIC GRAPHS: THE DOODLE MENU

A.1 INTRODUCTION: STARTING WITH DOODLE

A.2 NODES

A.3 EDGES

A.4 PANELS

A.5 A SIMPLE EXAMPLE

APPENDIX B: THE BATCH MODE: RUNNING A MODEL IN THE BACKGROUND USING SCRIPTS

B.1 INTRODUCTION

B.2 BASIC COMMANDS: COMPILING AND RUNNING THE MODEL

APPENDIX C: CHECKING CONVERGENCE USING CODA/BOA

C.1 INTRODUCTION

C.2 A SHORT HISTORICAL REVIEW

C.3 DIAGNOSTICS IMPLEMENTED BY CODA/BOA

C.4 A FIRST LOOK AT CODA/BOA

C.5 A SIMPLE EXAMPLE

APPENDIX D: NOTATION SUMMARY

D.1 MCMC

D.2 SUBSCRIPTS AND INDICES

D.3 PARAMETERS

D.4 RANDOM VARIABLES AND DATA

D.5 SAMPLE ESTIMATES

D.6 SPECIAL FUNCTIONS, VECTORS, AND MATRICES

D.7 DISTRIBUTIONS

D.8 DISTRIBUTION-RELATED NOTATION

D.9 NOTATION USED IN ANOVA AND ANCOVA

D.10 VARIABLE AND MODEL SPECIFICATION

D.11 DEVIANCE INFORMATION CRITERION (DIC)

D.12 PREDICTIVE MEASURES

REFERENCES

INDEX

WILEY SERIES IN COMPUTATIONAL STATISTICS

Consulting Editors:

Paolo Giudici
University of Pavia, Italy

Geof H. Givens
Colorado State University, USA

Bani K. Mallick
Texas A&M University, USA

Wiley Series in Computational Statistics is comprised of practical guides and cutting edge research books on new developments in computational statistics. It features quality authors with a strong applications focus. The texts in the series provide detailed coverage of statistical concepts, methods and case studies in areas at the interface of statistics, computing, and numerics.

With sound motivation and a wealth of practical examples, the books show in concrete terms how to select and to use appropriate ranges of statistical computing techniques in particular fields of study. Readers are assumed to have a basic understanding of introductory terminology.

The series concentrates on applications of computational methods in statistics to fields of bioinformatics, genomics, epidemiology, business, engineering, finance and applied statistics.

A complete list of titles in this series appears at the end of the volume.

Image

To Ioanna and our baby daughter

PREFACE

Since the mid-1980s, the development of widely accessible powerful computers and the implementation of Markov chain Monte Carlo (MCMC) methods have led to an explosion of interest in Bayesian statistics and modeling. This was followed by an extensive research for new Bayesian methodologies generating the practical application of complicated models used over a wide range of sciences. During the late 1990s, BUGS emerged in the foreground. BUGS was a free software that could fit complicated models in a relatively easy manner, using standard MCMC methods. Since 1998 or so, WinBUGS, the Windows version of BUGS, has earned great popularity among researchers of diverse scientific fields. Therefore, an increased need for an introductory book related to Bayesian models and their implementation via WinBUGS has been realized.

The objective of the present book is to offer an introduction to the principles of Bayesian modeling, with emphasis on model building and model implementation using WinBUGS. Detailed examples are provided, ranging from very simple to more advanced and realistic ones. Generalized linear models (GLMs), which are familiar to most students and researchers, are discussed. Details concerning model building, prior specification, writing the WinBUGS code and the analysis and interpretation of the WinBUGS output are also provided. Because of the introductory character of the book, I focused on elementary models, starting from the normal regression models and moving to generalized linear models. Even more advanced readers, familiar with such models, may benefit from the Bayesian implementation using WinBUGS.

Basic knowledge of probability theory and statistics is assumed. Computations that could not be performed in WinBUGS are illustrated using R. Therefore, a minimum knowledge of R is also required.

This manuscript can be used as the main textbook in a second-level course of Bayesian statistics focusing on modeling and/or computation. Alternatively, it can serve as a companion (to a main textbook) in an introductory course of a Bayesian statistics. Finally, because of its structure, postgraduate students and other researchers can complete a self-taught tutorial course on Bayesian modeling by following the material of this book.

All datasets and code used in the book are available in the book’s Webpage: www.stat-athens.aueb.gr/~jbn/winbugs_book.

IOANNIS NTZOUFRAS

Athens, Greece
June 29, 2008

ACKNOWLEDGMENTS

I am indebted to the people at Wiley publications for their understanding and assistance during the preparation of the manuscript. Acknowledgments are due to the anonymous referees. Their suggestions and comments led to a substantial improvement of the present book. I would particularly like to thank Dimitris Fouskakis, colleague and good friend, for his valuable comments on an early version of chapters 1-6 and 10-11. I am also grateful to Professor Brani Vidakovic for proposing and motivating this book. Last but not least, I wish to thank my wife Ioanna for her love, support, and patience during the writing of this book as well as for her suggestions on the manuscript.

I. N.

ACRONYMS

ACF Autocorrelation
AIC Akaike information criterion
ANOVA Analysis of variance
ANCOVA Analysis of covariance
AR Attributable risk
BF Bayes factor
BIC Bayes information criterion
BOA Bayesian output analysis (R package)
BP Bivariate Poisson
BOD Biological oxygen demand (data variable in example 6.3)
BUGS Bayesian inference using Gibbs (software)
CDF Cumulative distribution function
COD Chemical oxygen demand (data variable in example 6.3)
CODA Convergence diagnostics and output analysis software for Gibbs sampling analysis (R package)
CPO Conditional Predictive Ordinate
CR corner (constraint)
CV Cross-validation
CV-1 Leave-one-out cross-validation
DAG Directed acyclic graph
DI Dispersion index
DIBP Diagonal inflated bivariate Poisson distribution
DIC Deviance information criterion
GLM Generalized linear model
GP Generalized Poisson
GVS Gibbs variable selection
ICPO Inverse conditional predictive ordinate
i.i.d. Independent identically distributed
LS Logarithmic score
MAP Maximum a posteriori
MP model Median probability
MCMC Markov chain Monte Carlo
MCE Monte Carlo error
ML Maximum likelihood
MLE Maximum-likelihood estimate/estimator
NB Negative binomial
OR Odds ratio
PBF Posterior Bayes factor
PD Poisson difference
p.d.f. Probability density function
PO Posterior model odds
PPO Posterior predictive ordinate
RJMCMC Reversible jump Markov chain Monte Carlo
RR Relative risk
SD Standard deviation
SE Standard error
SSVS Stochastic search variable selection
STZ sum-to-zero (constraint)
TS Total solids(data variable in example 6.3)
TVS Total volatile solids (data variable in example 6.3)
WinBUGS Windows version of BUGS (software)
ZI Zero inflated
ZID Zero inflated distribution
ZIP Zero inflated Poisson distribution
ZINB Zero inflated negative binomial distribution
ZIGP Zero inflated generalized Poisson distribution
ZIBP Zero inflated bivariate Poisson distribution