Published by John Wiley & Sons, Inc., Hoboken, New Jersey.
Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data:

ISBN 978-1-118-09315-8

CONTENTS

Preface

Chapter 1 Introduction to Statistical Science

1.1 The Scientific Method:honey A Process for Learning
1.2 The Role of Statistics in the Scientific Method
1.3 Main Approaches to Statistics
1.4 Purpose and Organization of This Text

Chapter 2 Scientific Data Gathering

2.1 Sampling from a Real Population
2.2 Observational Studies and Designed Experiments

Chapter 3 Displaying and Summarizing Data

3.1 Graphically Displaying a Single Variable
3.2 Graphically Comparing Two Samples
3.3 Measures of Location
3.4 Measures of Spread
3.5 Displaying Relationships Between Two or More Variables
3.6 Measures of Association for Two or More Variables
Exercises

Chapter 4 Logic, Probability, and Uncertainty

4.1 Deductive Logic and Plausible Reasoning
4.2 Probability
4.3 Axioms of Probability
4.4 Joint Probability and Independent Events
4.5 Conditional Probability
4.6 Bayes’ Theorem
4.7 Assigning Probabilities
4.8 Odds and Bayes Factor
4.9 Beat the Dealer
Exercises

Chapter 5 Discrete Random Variables

5.1 Discrete Random Variables
5.2 Probability Distribution of a Discrete Random Variable
5.3 Binomial Distribution
5.4 Hypergeometric Distribution
5.5 Poisson Distribution
5.6 Joint Random Variables
5.7 Conditional Probability for Joint Random Variables
Exercises

Chapter 6 Bayesian Inference for Discrete Random Variables

6.1 Two Equivalent Ways of Using Bayes’ Theorem
6.2 Bayes’ Theorem for Binomial with Discrete Prior
6.3 Important Consequences of Bayes’ Theorem
6.4 Bayes’ Theorem for Poisson with Discrete Prior
Exercises
Computer Exercises

Chapter 7 Continuous Random Variables

7.1 Probability Density Function
7.2 Some Continuous Distributions
7.3 Joint Continuous Random Variables
7.4 Joint Continuous and Discrete Random Variables
Exercises

Chapter 8 Bayesian Inference for Binomial Proportion

8.1 Using a Uniform Prior
8.2 Using a Beta Prior
8.3 Choosing Your Prior
8.4 Summarizing the Posterior Distribution
8.5 Estimating the Proportion
8.6 Bayesian Credible Interval
Exercises
Computer Exercises

Chapter 9 Comparing Bayesian and Frequentist Inferences for Proportion

9.1 Frequentist Interpretation of Probability and Parameters
9.2 Point Estimation
9.3 Comparing Estimators for Proportion
9.4 Interval Estimation
9.5 Hypothesis Testing
9.6 Testing a One-Sided Hypothesis
9.7 Testing a Two-Sided Hypothesis
Exercises
Monte Carlo Exercises

Chapter 10 Bayesian Inference for Poisson

10.1 Some Prior Distributions for Poisson
10.2 Inference for Poisson Parameter
Exercises
Computer Exercises

Chapter 11 Bayesian Inference for Normal Mean

11.1 Bayes’ Theorem for Normal Mean with a Discrete Prior
11.2 Bayes’ Theorem for Normal Mean with a Continuous Prior
11.3 Choosing Your Normal Prior
11.4 Bayesian Credible Interval for Normal Mean
11.5 Predictive Density for Next Observation
Exercises
Computer Exercises

Chapter 12 Comparing Bayesian and Frequentist Inferences for Mean

12.1 Comparing Frequentist and Bayesian Point Estimators
12.2 Comparing Confidence and Credible Intervals for Mean
12.3 Testing a One-Sided Hypothesis about a Normal Mean
12.4 Testing a Two-Sided Hypothesis about a Normal Mean
Exercises

Chapter 13 Bayesian Inference for Difference Between Means

13.1 Independent Random Samples from Two Normal Distributions
13.2 Case 1:honey Equal Variances
13.3 Case 2:honey Unequal Variances
13.4 Bayesian Inference for Difference Between Two Proportions Using Normal Approximation
13.5 Normal Random Samples from Paired Experiments
Exercises

Chapter 14 Bayesian Inference for Simple Linear Regression

14.1 Least Squares Regression
14.2 Exponential Growth Model
14.3 Simple Linear Regression Assumptions
14.4 Bayes’ Theorem for the Regression Model
14.5 Predictive Distribution for Future Observation
Exercises
Computer Exercises

Chapter 15 Bayesian Inference for Standard Deviation

15.1 Bayes’ Theorem for Normal Variance with a Continuous Prior
15.2 Some Specific Prior Distributions and the Resulting Posteriors
15.3 Bayesian Inference for Normal Standard Deviation
Exercises
Computer Exercises

Chapter 16 Robust Bayesian Methods

16.1 Effect of Misspecified Prior
16.2 Bayes’ Theorem with Mixture Priors
Exercises
Computer Exercises

Chapter 17 Bayesian Inference for Normal with Unknown Mean and Variance

17.1 The Joint Likelihood Function
17.2 Finding the Posterior when Independent Jeffreys’ Priors for μ and σ² Are Used
17.3 Finding the Posterior when a Joint Conjugate Prior for μ and σ² Is Used
17.4 Difference Between Normal Means with Equal Unknown Variance
17.5 Difference Between Normal Means with Unequal Unknown Variances
Computer Exercises
Appendix:honey Proof that the Exact Marginal Posterior Distribution of μ Is Student’s t

Chapter 18 Bayesian Inference for Multivariate Normal Mean Vector

18.1 Bivariate Normal Density
18.2 Multivariate Normal Distribution
18.3 The Posterior Distribution of the Multivariate Normal Mean Vector when Covariance Matrix Is Known
18.4 Credible Region for Multivariate Normal Mean Vector when Covariance Matrix Is Known
18.5 Multivariate Normal Distribution with Unknown Covariance Matrix
Computer Exercises

Chapter 19 Bayesian Inference for the Multiple Linear Regression Model

19.1 Least Squares Regression for Multiple Linear Regression Model
19.2 Assumptions of Normal Multiple Linear Regression Model
19.3 Bayes’ Theorem for Normal Multiple Linear Regression Model
19.4 Inference in the Multivariate Normal Linear Regression Model
19.5 The Predictive Distribution for a Future Observation
Computer Exercises

Chapter 20 Computational Bayesian Statistics Including Markov Chain Monte Carlo

20.1 Direct Methods for Sampling from the Posterior
20.2 Sampling Importance Resampling
20.3 Markov Chain Monte Carlo Methods
20.4 Slice Sampling
20.5 Inference from a Posterior Random Sample
20.6 Where to Next?

A Introduction to Calculus

B Use of Statistical Tables

C Using the Included Minitab Macros

D Using the Included R Functions

E Answers to Selected Exercises

References

Index

EULA

List of Tables

Chapter 3

Table 3.1
Table 3.2
Table 3.3
Table 3.4

Chapter 5

Table 5.1
Table 5.2
Table 5.3
Table 5.4
Table 5.5
Table 5.6

Chapter 6

Table 6.1
Table 6.2
Table 6.3
Table 6.4
Table 6.5
Table 6.6
Table 6.7
Table 6.8
Table 6.9
Table 6.10
Table 6.11
Table 6.12
Table 6.13
Table 6.14
Table 6.15
Table 6.16

Chapter 8

Table 8.1
Table 8.2
Table 8.3
Table 9.1
Table 9.2

Chapter 10

Table 10.1
Table 10.2
Table 10.3
Table 10.4

Chapter 11

Table 11.1
Table 11.2
Table 11.3
Table 11.4
Table 11.5

Chapter 12

Table 12.1

Chapter 13

Table 13.1
Table 13.2
Table 13.3

Chapter 14

Table 14.1
Table 14.2

Chapter 15

Table 15.1
Table 15.2
Table 15.3
Table 15.4

Chapter 17

Table 17.1
Table 17.2

Chapter 19

Table 19.1

Chapter 20

Table 20.1
Table 20.2
Table 20.3
Table 20.4
Table 20.5

Table B.1
Table B.2
Table B.3
Table B.4
Table B.5
Table B.6

Table C.1
Table C.2
Table C.3
Table C.4
Table C.5
Table C.6
Table C.7
Table C.8
Table C.9
Table C.10
Table C.11
Table C.12
Table C.13
Table C.14
Table C.15
Table C.16
Table C.17
Table C.18
Table C.19
Table C.20
Table C.21
Table C.22
Table C.23
Table C.24
Table C.25

Table D.1
Table D.2
Table D.3
Table D.4

List of Illustrations

Chapter 1

Figure 1.1 Association between two variables.
Figure 1.2 Association due to causal relationship.
Figure 1.3 Association due to lurking variable.
Figure 1.4 Confounded causal and lurking variable effects.

Chapter 2

Figure 2.1 Variation among experimental units.
Figure 2.2 Completely randomized design. Units have been randomly assigned to four treatment groups.
Figure 2.3 Similar units have been put into blocks.
Figure 2.4 Randomized block design. One unit in each block randomly assigned to each treatment group. Randomizations in different blocks are independent of each other.

Chapter 3

Figure 3.1 Dotplot of Earth density measurements by Cavendish.
Figure 3.2 Boxplot of Earth density measurements by Cavendish.
Figure 3.3 Stem-and-leaf plot for Cavendish’s Earth density measurements.
Figure 3.4 Histograms of Earth density measurements by Cavendish with different boundaries. Note that the area is always proportional to frequency.
Figure 3.5 Cumulative frequency polygon of Earth density measurements by Cavendish.
Figure 3.6 Dotplots of Michelson's speed-of-light measurements.
Figure 3.7 Boxplot of Michelson’s speed-of-light measurements.
Figure 3.8 Back-to-back stem-and-leaf plots for Michelson’s data.
Figure 3.9 The mean as the balance point of the data is affected by moving the outlier.
Figure 3.10 The median as the middle point of the data is not affected by moving the outlier.
Figure 3.11 Head length versus head width in black bears.
Figure 3.12 Scatterplot matrix of bear data.

Chapter 4

Figure 4.1 “If A is true then B is true.” Deduction is possible.
Figure 4.2 Both “A is true” and “ A is false” are consistent with both “B is true” and “B is false.” No deduction is possible here.
Figure 4.3 Event F is a subset of event E.
Figure 4.4 E and F are intersecting events.
Figure 4.5 Event E and event F are mutually exclusive or disjoint events.
Figure 4.6 Two events A and B in the universe U.
Figure 4.7 The reduced universe, given that event A has occurred.
Figure 4.8 Four events B_i for i = 1,...,4 that partition the universe U, along with event A.
Figure 4.9 The reduced universe given event A has occurred, together with the four events partitioning the universe.
Figure 4.10 The Bayesian universe U with four unobservable events B_i for i = 1,..., 4 which partition it shown in the vertical dimension, and the observable event A shown in the horizontal dimension.
Figure 4.11 The reduced Bayesian universe, given A has occurred, together with the four unobservable events B_i for i = 1,. .., 4 that partition it.

Chapter 5

Figure 5.1 Proportions resulting from 10, 100, 1,000, and 10,000 rolls of a fair die.

Chapter 7

Figure 7.1 Sample probability function after 25 draws.
Figure 7.2 Sample probability function after 100 draws.
Figure 7.3 Density histogram after 100 draws.
Figure 7.4 Density histogram after 1,000 draws.
Figure 7.5 Density histogram after 10,000 draws.
Figure 7.6 The curve g(x) = kx¹(1 − x)² for several values of k.
Figure 7.7 The curve g(x) = kx³e^-4x for several values of k.
Figure 7.8 The curve for several values of k.
Figure 7.9 The area between -.62 and 1.37 split into two parts.
Figure 7.10 A joint density.
Figure 7.11 A joint continuous and discrete distribution.

Chapter 8

Figure 8.1 Some beta distributions.
Figure 8.2 Anna’s, Bart’s, and Chris’ prior distribution.
Figure 8.3 Anna’s, Bart’s, and Chris’ posterior distributions.

Chapter 9

Figure 9.1 Sampling distributions of three estimators.
Figure 9.2 Mean squared error for the two estimators.
Figure 9.3 Posterior probability of the null hypothesis, H₀ : π ≤ .6 is the shaded area.

Chapter 10

Figure 10.1 The shapes of Aretha's, Byron's, Chase's, and Diana's prior distributions.
Figure 10.2 Aretha’s, Byron’s, Chase’s, and Diana’s posterior distributions.
Figure 10.3 The mean squared error for the two estimators.

Chapter 11

Figure 11.1 The shapes of Arnie’s, Barb’s, and Chuck’s priors.
Figure 11.2 Arnie’s, Barb’s, and Chuck’s posteriors. (Barb and Chuck have nearly identical posteriors.)

Chapter 12

Figure 12.1 Biases of Arnold's, Beth's, and Carol's estimators.
Figure 12.2 Mean-squared errors of Arnold’s, Beth’s, and Carol’s estimators.
Figure 12.3 Null distribution of with rejection region for one-sided frequentist hypothesis test at 5% level of significance.
Figure 12.4 Null distribution of with rejection region for two-sided frequentist hypothesis test at 5% level of significance.

Chapter 13

Figure 13.1 The shapes of Aleece’s, Brad’s, and Curtis’ prior distributions.
Figure 13.2 Aleece’s, Brad’s, and Curtis’s posterior distributions.

Chapter 14

Figure 14.1 Scatterplot with three possible lines, and the residuals from each of the lines. The third line is the least squares line. It minimizes the sum of squares of the residuals.
Figure 14.2 Scatterplot and least squares line for the moisture data.
Figure 14.3 Scatterplot and least squares line for the poultry production data.
Figure 14.4 Scatterplot and fitted exponential growth model for the poultry production data.
Figure 14.5 Assumptions of linear regression model. The mean of Y given X is a linear function. The observation errors are normally distributed with mean 0 and equal variances. The observations are independent of each other.
Figure 14.6 The prior and posterior distribution of the slope.
Figure 14.7 The predictive mean with 95% prediction bounds.
Figure 15.1 Inverse chi-squared distribution with for 1,..., 5 degrees of freedom. As the degrees of freedom increase, the probability gets more concentrated at smaller values. Note: S = 1 for all these graphs.
Figure 15.2 Prior for standard deviation σ corresponding to inverse chi square prior for variance σ² where S = 1.
Figure 15.3 Inverse chi-squared prior densities having same prior medians, for κ = 1,..., 5 degrees of freedom.
Figure 15.4 Logarithms of Inverse chi-squared prior densities having same prior medians, for κ = 1,..., 5 degrees of freedom.
Figure 15.5 The shapes of Aroha’s, Bernardo’s, and Carlos’ prior distributions for the standard deviation σ.
Figure 15.6 Aroha's, Bernardo's, and Carlos' posterior distributions for variance σ².
Figure 15.7 Aroha’s, Bernardo’s, and Carlos’ posterior distributions for standard deviation σ.

Chapter 16

Figure 16.1 Archie’s prior, likelihood, and posterior.
Figure 16.2 Andrea’s prior, likelihood, and posterior.
Figure 16.3 Ben’s mixture prior and components.
Figure 16.4 Ben’s mixture posterior and its two components.
Figure 16.5 Ben’s mixture prior, likelihood, and mixture posterior.
Figure 16.6 Caitlin’s mixture prior and its components.
Figure 16.7 Caitlin's mixture posterior and its two components.
Figure 16.8 Caitlin’s mixture prior, the likelihood, and her mixture posterior.

Chapter 17

Figure 17.1 Amber’s prior, likelihood and posterior distributions conditional on
Figure 17.2 Brett’s prior, likelihood and posterior distributions conditional on .
Figure 17.3 Chandra’s prior, likelihood and posterior distributions conditional on .

Chapter 19

Figure 19.1 Scatterplot matrix of the variables in the Bears data.
Figure 19.2 scatterplot matrix of the variables in the Bears data.

Chapter 20

Figure 20.1 Chiara’s samples from the posterior distribution for sample sizes 1,000,10,000,100,000, and 1,000,000 respectively.
Figure 20.2 The cumulative distribution function maps values of a random variate X (which may take values in the interval (a, b)) to values in the interval [0,1], and the inverse cumulative distribution function maps values in the interval [0,1] to values in the interval (a, b).
Figure 20.3 Leah’s sample from an exponential(λ = 2) distribution
Figure 20.4 Fiona’s target density and proposal density.
Figure 20.5 Fiona’s sample of size 10,000
Figure 20.6 Daniel’s target density and proposal density
Figure 20.7 Lucy’s first envelope function
Figure 20.8 The new tangent line is closest to the log density over the range defined by between where it intersects with the other tangent lines
Figure 20.9 The updated envelope function
Figure 20.10 100 estimates of Pr(Z > 5), Z ∼ normal(0, 1) using samples size 100,000 and importance sampling compared to samples of size 10⁸ and naïve Monte Carlo methods.
Figure 20.11 Sample of size 10,000 from a beta(2, 8) density using the SIR method with a normal(.2, 0.015) importance density.
Figure 20.12 Six consecutive draws from a Metropolis–Hastings chain with a random walk candidate density. Note: the candidate density is centered around the current value.
Figure 20.13 Trace plot and histogram of 1,000 Metropolis–Hastings values using the random-walk candidate density with a standard deviation of 1.
Figure 20.14 Histograms of 5,000 and 20,000 draws from the Metropolis–Hastings chain using the random-walk candidate density with a standard deviation of 1.
Figure 20.15 Trace plot and histogram of 1,000 Metropolis–Hastings values using the independent candidate density with a mean of 0 and a standard deviation of 3.
Figure 20.16 Histograms of 5,000 and 20,000 draws from the Metropolis–Hastings chain using the independent candidate density with a mean of 0 and a standard deviation of 3.
Figure 20.17 Trace plots of θ₁ and θ₂ for 1,000 steps of the Gibbs sampling chain.
Figure 20.18 Scatterplot of θ₂ versus θ₁ for 1,000 draws from the Gibbs sampling chain with the contours from the exact posterior.
Figure 20.19 Histograms of θ₁ and θ₂ for 5,000 and 20,000 draws of the Gibbs sampling chain.
Figure 20.20 The effect of changing the prior value of the scaling constant, S, on the medians and credible intervals for the group standard deviations.
Figure 20.21 The posterior ratio of the variances for the two estuaries.
Figure 20.22 Uniformly distributed points and the unscaled target
Figure 20.23 Uniformly distributed points under the unscaled target
Figure 20.24 Histogram of the remaining points and the target density
Figure 20.25 Four slice sampling steps. The left pane is sampling the auxiliary variable in the vertical dimension. The right pane is sampling θ in the horizontal dimension.

Figure A.1 Graph of function f(x) = x⁴ × (1 × x)⁶.
Figure A.2 Graph of function .
Figure A.3 Graph of on A = (-1, 0) ∩ (0, 1). Note that f is not defined at x = 0.
Figure A.4 Graph of at three scales. Note that f is defined at all real numbers except for x = 0.
Figure A.5 The derivative at a point is the slope of the tangent to the curve at that point.
Figure A.6 Graph of f(x) = x³ and its derivative. The derivative function is negative where the original function is increasing, and it is positive where the original function is increasing We see that the original function has a point of inflection at x = 0.
Figure A.7 Lower and upper sums over a partition and its refinement. The lower sum has increased and the upper sum has decreased in the refinement. The area under the curve is always between the lower and upper sums.
Figure A.8 The partition induced for the function where ε₁ = 1 and its refinement where .
Figure A.9 The function f(x) = x^−1/2.

Figure B.1 Standard normal density.
Figure B.2 Shaded area under standard normal density. These values are shown in Table B.2.
Figure B.3 Ordinates of standard normal density function. These values are shown in Table B.3.
Figure B.4 Student’s t densities for selected degrees of freedom, together with the standard normal (0,1) density which corresponds to Student’s t with ∞ degrees of freedom.
Figure B.5 Upper tail area of chi-squared distribution.

PREFACE

Our original goal for this book was to introduce Bayesian statistics at the earliest possible stage to students with a reasonable mathematical background. This entailed coverage of a similar range of topics as an introductory statistics text, but from a Bayesian perspective. The emphasis is on statistical inference. We wanted to show how Bayesian methods can be used for inference and how they compare favorably with the frequentist alternatives. This book is meant to be a good place to start the study of Bayesian statistics. From the many positive comments we have received from many users, we think the book succeeded in its goal. A course based on this goal would include Chapters 1-14.

Our feedback also showed that many users were taking up the book at a more intermediate level instead of the introductory level original envisaged. The topics covered in Chapters 2 and 3 would be old hat for these users, so we would have to include some more advanced material to cater for the needs of that group. The second edition aimed to meet this new goal as well as the original goal. We included more models, mainly with a single parameter. Nuisance parameters were dealt with using approximations. A course based on this goal would include Chapters 4-16.

Changes in the Third Edition

Later feedback showed that some readers with stronger mathematical and statistical background wanted the text to include more details on how to deal with multi-parameter models. The third edition contains four new chapters to satisfy this additional goal, along with some minor rewriting of the existing chapters. Chapter 17 covers Bayesian inference for Normal observations where we do not know either the mean or the variance. This chapter extends the ideas in Chapter 11, and also discusses the two sample case, which in turn allows the reader to consider inference on the difference between two means. Chapter 18 introduces the Multivariate Normal distribution, which we need in order to discuss multiple linear regression in Chapter 19. Finally, Chapter 20 takes the user beyond the kind of conjugate analysis is considered in most of the book, and into the realm of computational Bayesian inference. The covered topics in Chapter 20 have an intentional light touch, but still give the user valuable information and skills that will allow them to deal with different problems. We have included some new exercises and new computer exercises which use new Minitab macros and R-functions. The Minitab macros can be downloaded from the book website: http://introbayes.ac.nz. The new R functions have been incorporated in a new and improved version of the R package Bolstad, which can either be downloaded from a CRAN mirror or installed directly in R using the internet. Instructions on the use and installation of the Minitab macros and the Bolstad package in R are given in Appendices C and D respectively. Both of these appendices have been rewritten to accommodate changes in R and Minitab that have occurred since the second edition.

Our Perspective on Bayesian Statistics

A book can be characterized as much by what is left out as by what is included. This book is our attempt to show a coherent view of Bayesian statistics as a good way to do statistical inference. Details that are outside the scope of the text are included in footnotes. Here are some of our reasons behind our choice of the topics we either included or excluded.

In particular, we did not mention decision theory or loss functions when discussing Bayesian statistics. In many books, Bayesian statistics gets compartmentalized into decision theory while inference is presented in the frequentist manner. While decision theory is a very interesting topic in its own right, we want to present the case for Bayesian statistical inference, and did not want to get side-tracked.

We think that in order to get full benefit of Bayesian statistics, one really has to consider all priors subjective. They are either (1) a summary of what you believe or (2) a summary of all you allow yourself to believe initially. We consider the subjective prior as the relative weights given to each possible parameter value, before looking at the data. Even if we use a at prior to give all possible values equal prior weight, it is subjective since we chose it. In any case, it gives all values equal weight only in that parameterization, so it can be considered “objective” only in that parameterization. In this book we do not wish to dwell on the problems associated with trying to be objective in Bayesian statistics. We explain why universal objectivity is not possible (in a footnote since we do not want to distract the reader). We want to leave him/her with the “relative weight” idea of the prior in the parameterization in which they have the problem in.

In the first edition we did not mention Jeffreys' prior explicitly, although the beta prior for binomial and at prior for normal mean are in fact the Jeffreys' prior for those respective observation distributions. In the second edition we do mention Jeffreys' prior for binomial, Poisson, normal mean, and normal standard deviation. In third edition we mention the independent Jeffreys priors for normal mean and standard deviation. In particular, we don't want to get the reader involved with the problems about Jeffreys' prior, such as for mean and variance together, as opposed to independent Jeffreys' priors, or the Jeffreys' prior violating the likelihood principal. These are beyond the level we wish to go. We just want the reader to note the Jeffreys' prior in these cases as possible priors, the relative weights they give, when they may be appropriate, and how to use them. Mathematically, all parameterizations are equally valid; however, usually only the main one is very meaningful. We want the reader to focus on relative weights for their parameterization as the prior. It should be (a) a summary of their prior belief (conjugate prior matching their prior beliefs about moments or median), (b) at (hence objective) for their parameterization, or (c) some other form that gives reasonable weight over the whole range of possible values. The posteriors will be similar for all priors that have reasonable weight over the whole range of possible values.

The Bayesian inference on the standard deviation of the normal was done where the mean is considered a known parameter. The conjugate prior for the variance is the inverse chi-squared distribution. Our intuition is about the standard deviation, yet we are doing Bayes' theorem on the variance. This required introducing the change of variable formula for the prior density.

In the second edition we considered the mean as known. This avoided the mathematically more advanced case where both mean and standard deviation are unknown. In the third edition we now cover this topic in Chapter 17. In earlier editions the Student's t is presented as the required adjustment to credible intervals for the mean when the variance is estimated from the data. In the third edition we show in Chapter 17 that in fact this would be the result when the joint posterior found, and the variance marginalized out. Chapter 17 also covers inference on the difference in two means. This problem is made substantially harder when one relaxes the assumption that both populations have the same variance. Chapter 17 derives the Bayesian solution to the well-known Behrens-Fisher problem for the difference in two population means with unequal population variances. The function bayes.t.test in the R package for this book actually gives the user a numerical solution using Gibbs sampling. Gibbs sampling is covered in Chapter 20 of this new edition.

Acknowledgments

WMB would like to thank all the readers who have sent him comments and pointed out misprints in the first and second editions. These have been corrected. WMB would like to thank Cathy Akritas and Gonzalo Ovalles at Minitab for help in improving his Minitab macros. WMB and JMC would like to thank Jon Gurstelle, Steve Quigley, Sari Friedman, Allison McGinniss, and the team at John Wiley & Sons for their support.

Finally, last but not least, WMB wishes to thank his wife Sylvie for her constant love and support.

WILLIAM M. “BILL' BOLSTAD

Hamilton, New Zealand

JAMES M. CURRAN

Auckland, New Zealand