Cover Page

Statistics in Practice

Series Advisors

Human and Biological Sciences

Stephen Senn

CRP-Santé, Luxembourg

Earth and Environmental Sciences

Marian Scott

University of Glasgow, UK

Industry, Commerce and Finance

Wolfgang Jank

University of Maryland, USA

Founding Editor

Vic Barnett

Nottingham Trent University, UK


Statistics in Practice is an important international series of texts which provide detailed coverage of statistical concepts, methods and worked case studies in specific fields of investigation and study.

With sound motivation and many worked practical examples, the books show in down-to-earth terms how to select and use an appropriate range of statistical techniques in a particular practical field within each title's special topic area.

The books provide statistical support for professionals and research workers\break across a range of employment fields and research environments. Subject areas covered include medicine and pharmaceutics; industry, finance and commerce; public services; the earth and environmental sciences, and so on.

The books also provide support to students studying statistical courses applied to the above areas. The demand for graduates to be equipped for the work environment has led to such courses becoming increasingly prevalent at universities and colleges.

It is our aim to present judiciously chosen and well-written workbooks to meet everyday practical needs. Feedback of views from readers will be most valuable to monitor the success of this aim.

A complete list of titles in this series appears at the end of the volume.

Applied Mixed Models
in Medicine

Third Edition

Helen Brown

The Roslin Institute
University of Edinburgh, UK

Robin Prescott

Centre for Population Health Sciences
University of Edinburgh, UK

Title Page

Preface to third edition

Analysis of variance and regression has for many years been the mainstay of statistical modelling. These techniques usually have as a basic assumption that the residual or error terms are independently and identically distributed. Mixed models are an important approach to modelling, which allows us to relax the independence assumption and take into account more complicated data structures in a flexible way. Sometimes, this interdependence of observations is modelled directly in a mixed model. For example, if a number of repeated measurements are made on a patient, then mixed models allow us to specify a pattern for the correlation between these measurements. In other contexts, such as the cross-over clinical trial, specifying that patient effects are normally distributed, rather than fixed as in the classical approach, induces observations on the same patient to be correlated.

There are many benefits to be gained from using mixed models. In some situations, the benefit will be an increase in the precision of our estimates. In others, we will be able to make wider inferences. We will sometimes be able to use a more appropriate model that will give us greater insight into what underpins the structure of the data. However, it is only the availability of software in versatile packages such as SAS® that has made these techniques widely accessible. It is now important that suitable information on their use becomes available so that they may be applied confidently on a routine basis.

Our intention in this book is to put all types of mixed models into a general framework and to consider the practical implications of their use. We aim to do this at a level that can be understood by applied statisticians and numerate scientists. Greatest emphasis is placed on skills required for the application of mixed models and interpretation of the results. An in-depth understanding of the mathematical theory underlying mixed models is not essential to gain these, but an awareness of the practical consequences of fitting different types of mixed models is necessary. While many publications are available on various aspects of mixed models, these generally relate to specific types of model and often differ in their use of terminology. Such publications are not always readily comprehensible to the applied statisticians who will be the most frequent users of the methods. An objective of this book is to help overcome this deficit.

Examples given will primarily relate to the medical field. However, the general concepts of mixed models apply equally to many other areas of application, for example, social sciences, agriculture, veterinary science and official statistics. (In the social sciences, mixed models are often referred to as ‘multi-level’ models.) Data are becoming easier to collect, with the consequence that datasets are now often large and complex. We believe that mixed models provide useful tools for modelling the complex structures that occur in such data.

The third edition of this book retains the structure of the first two, but there are further changes to reflect the continued evolution of SAS. This edition fully incorporates features of SAS up to version 9.3. Compared to what was available at the time of the previous edition, enhancements to SAS include improved graphical facilities. Importantly, there is also a new procedure, PROC MCMC, which facilitates Bayesian analysis. This has led to extensive changes in our coverage of Bayesian methods. SAS 9.3 and later versions now provide output both in text format from the output window and, additionally, as an HTML file in the results viewer. There have been accompanying minor changes in the details of outputs and graphs, such as labelling. Our approach to reporting SAS outputs in this edition has been to change our presentation from earlier editions only when we wish to highlight features that have changed substantially and, importantly, to facilitate the reader's use of mixed models, whatever their version of SAS.

During the drafting of this edition, SAS 9.4 became available. It is not fully incorporated into this book because its new features are focused more on the SAS high performance procedures than on improvements to the SAS/STAT procedures. These high performance procedures ‘provide predictive modelling tools that have been specially developed to take advantage of parallel processing in both multithread single-machine mode and distributed multi-machine mode’. Typically, the high performance procedures such as PROC HPLMIXED have a greatly reduced range of options compared to PROC MIXED and, consequently, are peripheral to the aims of this book. We do, however, consider some of the small modifications to improve procedures such as GLIMMIX and MCMC that are available in SAS/STAT® 12.1 and later versions.

Chapter 1 provides an introduction to the capabilities of mixed models, defines general concepts and gives their basic statistical properties. Chapter 2 defines models and fitting methods for normally distributed data. Chapter 3 first introduces generalised linear models that can be used for the analysis of data that are binomial or Poisson or from any other member of the exponential family of distributions. These methods are then extended to incorporate mixed models concepts under the heading of generalised linear mixed models. The fourth chapter examines how mixed models can be applied when the variable to be analysed is categorical. The main emphasis in these chapters, and indeed in the whole book, is on classical statistical approaches to inference, based on significance tests and confidence intervals. However, the Bayesian approach is also introduced in Chapter 2, since it has several potential advantages and its use is becoming more widespread. Although the overall emphasis of the book is on the application of mixed models techniques, these chapters can also be used as a reference guide to the underlying theory of mixed models.

Chapters 5–7 consider the practical implications of using mixed models for particular designs. Each design illustrates a different feature of mixed models.

Multi-centre trials and meta-analyses are considered in Chapter 5. These are examples of hierarchical data structures, and the use of a mixed model allows for any additional variation in treatment effects occurring between centres (or trials) and hence makes results more generalisable. The methods shown can be applied equally to any type of hierarchical data.

In Chapter 6, the uses of covariance pattern models and random coefficients models are described using the repeated measures design. These approaches take into account the correlated nature of the repeated observations and give more appropriate treatment effect estimates and standard errors. The material in this chapter will apply equally to any situation where repeated observations are made on the same units.

Chapter 7 considers cross-over designs where each patient may receive several treatments. In this design, more accurate treatment estimates are often achieved by fitting patient effects as random. This improvement in efficiency can occur for any dataset where a fixed effect is ‘crossed’ with a random effect.

In Chapter 8, a variety of other designs and data structures is considered. These either incorporate several of the design aspects covered in Chapters 5–7 or have structures that have arisen in a more unplanned manner. They help to illustrate the broad scope of application of mixed models. This chapter includes two new sections. We have added a section on the analysis of bilateral data, a common structure in some areas of medical research, but one that we had not previously addressed. There is also a substantial new section on incomplete block designs.

Chapter 9 gives information on software available for fitting mixed models. Most of the analyses in the book are carried out using PROC MIXED in SAS, supplemented by PROC GENMOD, PROC GLIMMIX, and PROC MCMC. This chapter introduces the basic syntax for these procedures. This information should be sufficient for fitting most of the analyses described, but the full SAS documentation should be referenced for those who wish to use more complex features. The SAS code used for most of the examples is supplied within the text. In addition, the example datasets and SAS code may be obtained electronically from www.wiley.com/go/brown/applied_mixed.

This book has been written to provide the reader with a thorough understanding of the concepts of mixed models, and we trust it will serve well for this purpose. However, readers wishing to take a shortcut to the fitting of normal mixed models should read Chapter 1 for an introduction, Section 2.4 for practical details, and the chapter relevant to their design. To fit non-normal or categorical mixed models, Section 3.3 or Section 4.4 should be read in addition to Section 2.4. In an attempt to make this book easier to use, we have presented at the beginning of the text a summary of the notation we have used, while at the end, we list some key definitions in a glossary.

Our writing of this book has been aided in many ways. The first edition evolved from a constantly changing set of course notes that accompanied a 3-day course on the subject, run regularly over the previous 6 years. The second edition was helped by many individuals who were kind enough to comment on the first edition, including the identification of some errors that had slipped in, and by further participants at our courses who have contributed to discussions and have thereby helped to shape our views. This process has continued with the third edition. We are also grateful to many other colleagues who have read and commented on various sections of the manuscript and especially to our colleagues who have allowed us to use their data. We hope that readers will find the resulting book a useful reference in an interesting and expanding area of statistics.

Helen Brown
Robin Prescott
Edinburgh

Mixed models notation

The notation below is provided for quick reference. Models are defined more fully in Sections 2.1, 3.1 and 4.1.

Normal mixed model

equation

Generalised linear mixed model

equation

where

  1. y = dependent variable,
  2. e = residual error,
  3. X = design matrix for fixed effects,
  4. Z = design matrix for random effects,
  5. f03-math-0003 = fixed effects parameters,
  6. f03-math-0004 = random effects parameters,
  7. f03-math-0005 = residual variance matrix,
  8. f03-math-0006 = matrix of covariance parameters,
  9. f03-math-0007 = var(y) variance matrix,
  10. f03-math-0008 = expected values,
  11. f03-math-0009 = link function,
  12. f03-math-0010 = diagonal matrix of variance terms (e.g. B = f03-math-0011 for binary data).

Ordered categorical mixed model

equation

where

  1. f03-math-0013 = f03-math-0014
  2. f03-math-0015 = probability observation i is in category j,
  3. f03-math-0016 = f03-math-0017
  4. f03-math-0018 = f03-math-0019

About the Companion Website

This book is accompanied by a companion website:

www.wiley.com/go/brown/applied_mixed

This website includes SAS codes and datasets for most of the examples. In the future, updates and further materials may be added.