John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom
For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com.
The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.
Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. It is sold on the understanding that the publisher is not engaged in rendering professional services and neither the publisher nor the author shall be liable for damages arising herefrom. If professional advice or other expert assistance is required, the services of a competent professional should be sought.
Library of Congress Cataloging-in-Publication Data
Galwey, Nicholas
Introduction to mixed modelling : beyond regression and analysis of variance / N. W. Galwey. – Second edition.
pages cm
Includes bibliographical references and index.
ISBN 978-1-119-94549-9 (cloth)
1. Multilevel models (Statistics) 2. Experimental design. 3. Regression analysis. 4. Analysis of variance. I. Title.
QA276.G33 2014
519.5–dc23
2014021670
A catalogue record for this book is available from the British Library.
ISBN: 978-1-119-94549-9
Preface
This book is intended for research workers and students who have made some use of the statistical techniques of regression analysis and analysis of variance (anova), but who are unfamiliar with mixed models and the criterion for fitting them called REsidual Maximum Likelihood (REML, also known as REstricted Maximum Likelihood). Such readers will know that, broadly speaking, regression analysis seeks to account for the variation in a response variable by relating it to one or more explanatory variables, whereas anova seeks to detect variation among the mean values of groups of observations. In regression analysis, the statistical significance of each explanatory variable is tested using the same estimate of residual variance, namely the residual mean square, and this estimate is also used to calculate the standard error of the effect of each explanatory variable. However, this choice is not always appropriate. Sometimes, one or more of the terms in the regression model (in addition to the residual term) represents random variation, and such a term will contribute to the observed variation in other terms. It should therefore contribute to the significance tests and standard errors of these terms: but in an ordinary regression analysis, it does not do so. Anova, on the other hand, does allow the construction of models with additional random-effect terms, known as block terms. However, it does so only in the limited context of balanced experimental designs.
The capabilities of regression analysis can be combined with those of anova by fitting to the data a mixed model, so called because it contains both fixed-effect and random-effect terms. A mixed model allows the presence of additional random-effects terms to be recognized in the full range of regression models, not only in balanced designs. Any statistical analysis that can be specified by a general linear model (the broadest form of linear regression model) or by anova can also be specified by a mixed model. However, the specification of a mixed model requires an additional step. The researcher must decide, for each term in the model, whether effects of that term (e.g. the deviations of group means from the grand mean) can be regarded as values of a random variable—usually taken to mean that they are a random sample from some much larger population—or whether they are a fixed set. In some cases, this decision is straightforward: in others, the distinction is subtle and the decision difficult. However, provided that an appropriate decision is made (see Section 6.3), the mixed model specifies a statistical analysis which is of broader validity than regression analysis or anova, and which is nearly or fully equivalent to those methods in the special cases where they are applicable.
It is fairly straightforward to specify the calculations required for regression analysis and anova, and this is done in many standard textbooks. For example, Draper and Smith (1998) give a clear, thorough and extensive account of the methods of regression analysis, and Mead (1988) does the same for the analysis of variance. To solve the equations that specify a mixed model is much less straightforward. The model is fitted—that is, the best estimates of its parameters are obtained—using the REML criterion, but the fitting process requires recursive numerical methods. It is largely because of this burden of calculation that mixed models are less familiar than regression analysis and anova: it is only in about the past three decades that the development of computer power and user-friendly statistical software has allowed them to be used routinely in research. This book aims to provide a guide to the use of mixed models that is accessible to the broad community of research scientists. It focuses not on the details of calculation, but on the specification of mixed models and the interpretation of the results.
The numerical examples in this book are presented and analysed using three statistical software systems, namely
GenStat, distributed by VSN International Ltd, Hemel Hempstead, via the website https://www.vsni.co.uk/.
R, from The R Project for Statistical Computing. This software can be downloaded free of charge from the website http://www.r-project.org/.
SAS, available from the SAS Institute via the website http://www.sas.com/technologies/analytics/statistics/stat/.
GenStat is a natural choice of software to illustrate the concepts and methods employed in mixed modelling because its facilities for this purpose are straightforward to use, extensive and well integrated with the rest of the system and because their output is clearly laid out and easy to interpret. Above all, the recognition of random terms in statistical models lies at the heart of GenStat. GenStat's method of specifying anova models requires the distinction of random-effect (block) and fixed-effect (treatment) terms, which makes the interpretation of designed experiments uniquely reliable and straightforward. This approach extends naturally to mixed models and provides a firm framework within which the researcher can think and plan. Despite these merits, GenStat is not among the most widely used statistical software systems, and the numerical examples are therefore also analysed using the increasingly popular software R, as well as SAS, which is long-established and widely used in the clinical and agricultural research communities.
The book's website, http://www.wiley.com/go/beyond_regression, provides solutions to the end-of-chapter exercises, as well as data files, and programs in GenStat, R and SAS, for many of the examples in this book.
This second edition incorporates many additions and changes, as well as some corrections. The most substantial are
the addition of SAS to the software systems used;
a new chapter on meta-analysis and the multiple testing problem;
recognition of situations in which it is appropriate to specify the interaction between two factors as a random-effect term, even though both of the corresponding main effects are fixed-effect terms;
an account of the Bayesian interpretation of mixed models, an alternative to the random-sample (frequentist) interpretation mentioned above;
a fuller account of the ‘great mixed model muddle’;
the random coefficient regression model.
I am grateful to the following individuals for their valuable comments and suggestions on the manuscript of this book, and/or for introducing me to mixed-modelling concepts and techniques in the three software systems: David Balding, Aruna Bansal, Caroline Galwey, Toby Johnson, Peter Lane, Roger Payne, James Roger and David Willé. I am also grateful to the participants in the GenStat Discussion List for their helpful responses to many enquiries. (Access to this lively forum can be obtained via the website https://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=GENSTAT.) Any errors or omissions of fact or interpretation that remain are my sole responsibility. I would also like to express my gratitude to the many individuals and organizations who have given permission for the reproduction of data in the numerical examples presented. They are acknowledged individually in their respective places, but the high level of support that they have given me deserves to be recognized here.
References
Draper, N.R. and Smith, H. (1998) Applied Regression Analysis, 3rd edn, John Wiley & Sons, Inc., New York, 706 pp.
Mead, R. (1988) The Design of Experiments: Statistical Principles for Practical Application, Cambridge University Press, Cambridge, 620 pp.