Contents

PREFACE

CHAPTER 1 INTRODUCTION

1.1 PREVIEW
1.2 NOTATIONAL CONVENTIONS
1.3 BASIC CHARACTERISTICS OF A MEASUREMENT METHOD
1.4 METHOD COMPARISON STUDIES
1.5 MEANING OF AGREEMENT
1.6 A MEASUREMENT ERROR MODEL
1.7 SIMILARITY VERSUS AGREEMENT
1.8 A TOY EXAMPLE
1.9 CONTROVERSIES AND OUR VIEW
1.10 CONCEPTS RELATED TO AGREEMENT
1.11 ROLE OF CONFIDENCE INTERVALS AND HYPOTHESES TESTING
1.12 COMMON MODELS FOR PAIRED MEASUREMENTS DATA
1.13 THE BLAND-ALTMAN PLOT
1.14 COMMON REGRESSION APPROACHES
1.15 INAPPROPRIATE USE OF COMMON TESTS IN METHOD COMPARISON STUDIES
1.16 KEY STEPS IN THE ANALYSIS OF METHOD COMPARISON DATA
1.17 CHAPTER SUMMARY
1.18 BIBLIOGRAPHIC NOTE
EXERCISES

CHAPTER 2 COMMON APPROACHES FOR MEASURING AGREEMENT

2.1 PREVIEW
2.2 INTRODUCTION
2.3 MEAN SQUARED DEVIATION
2.4 CONCORDANCE CORRELATION COEFFICIENT
2.5 A DIGRESSION: TOLERANCE AND PREDICTION INTERVALS
2.6 LIN’S PROBABILITY CRITERION AND BLAND-ALTMAN CRITERION
2.7 LIMITS OF AGREEMENT
2.8 TOTAL DEVIATION INDEX AND COVERAGE PROBABILITY
2.9 INFERENCE ON AGREEMENT MEASURES
2.10 CHAPTER SUMMARY
2.11 BIBLIOGRAPHIC NOTE
EXERCISES

CHAPTER 3 A GENERAL APPROACH FOR MODELING AND INFERENCE

3.1 PREVIEW
3.2 MIXED-EFFECTS MODELS
3.3 A LARGE-SAMPLE APPROACH TO INFERENCE
3.4 MODELING AND ANALYSIS OF METHOD COMPARISON DATA
3.5 CHAPTER SUMMARY
3.6 BIBLIOGRAPHIC NOTE
EXERCISES

CHAPTER 4 PAIRED MEASUREMENTS DATA

4.1 PREVIEW
4.2 MODELING OF DATA
4.3 EVALUATION OF SIMILARITY AND AGREEMENT
4.4 CASE STUDIES
4.5 CHAPTER SUMMARY
4.6 TECHNICAL DETAILS
4.7 BIBLIOGRAPHIC NOTE
EXERCISES

CHAPTER 5 REPEATED MEASUREMENTS DATA

5.1 PREVIEW
5.2 INTRODUCTION
5.3 DISPLAYING DATA
5.4 MODELING OF DATA
5.6 EVALUATION OF REPEATABILITY
5.7 CASE STUDIES
5.8 CHAPTER SUMMARY
5.9 TECHNICAL DETAILS
5.10 BIBLIOGRAPHIC NOTE
EXERCISES

CHAPTER 6 HETEROSCEDASTIC DATA

6.1 PREVIEW
6.2 INTRODUCTION
6.3 VARIANCE FUNCTION MODELS
6.4 REPEATED MEASUREMENTS DATA
6.5 PAIRED MEASUREMENTS DATA
6.6 CHAPTER SUMMARY
6.7 TECHNICAL DETAILS
6.8 BIBLIOGRAPHIC NOTE
EXERCISES

CHAPTER 7 DATA FROM MULTIPLE METHODS

7.1 PREVIEW
7.2 INTRODUCTION
7.3 DISPLAYING DATA
7.4 EXAMPLE DATASETS
7.5 MODELING UNREPLICATED DATA
7.6 MODELING REPEATED MEASUREMENTS DATA
7.7 MODEL FITTING AND EVALUATION
7.8 EVALUATION OF SIMILARITY AND AGREEMENT
7.9 EVALUATION OF REPEATABILITY
7.10 CASE STUDIES
7.11 CHAPTER SUMMARY
7.12 TECHNICAL DETAILS
7.13 BIBLIOGRAPHIC NOTE
EXERCISES

CHAPTER 8 DATA WITH COVARIATES

8.1 PREVIEW
8.2 INTRODUCTION
8.3 MODELING OF DATA
8.4 EVALUATION OF SIMILARITY, AGREEMENT, AND REPEATABILITY
8.5 CASE STUDY
8.6 CHAPTER SUMMARY
8.7 TECHNICAL DETAILS
8.8 BIBLIOGRAPHIC NOTE
EXERCISES

CHAPTER 9 LONGITUDINAL DATA

9.1 PREVIEW
9.2 INTRODUCTION
9.3 MODELING OF DATA
9.4 EVALUATION OF SIMILARITY AND AGREEMENT
9.5 CASE STUDY
9.6 CHAPTER SUMMARY
9.7 TECHNICAL DETAILS
9.8 BIBLIOGRAPHIC NOTE
EXERCISES

CHAPTER 10 A NONPARAMETRIC APPROACH

10.1 PREVIEW
10.2 INTRODUCTION
10.3 THE STATISTICAL FUNCTIONAL APPROACH
10.4 EVALUATION OF SIMILARITY AND AGREEMENT
10.5 CASE STUDIES
10.6 CHAPTER SUMMARY
10.7 TECHNICAL DETAILS
10.8 BIBLIOGRAPHIC NOTE
EXERCISES

CHAPTER 11 SAMPLE SIZE DETERMINATION

11.1 PREVIEW
11.2 INTRODUCTION
11.3 THE SAMPLE SIZE METHODOLOGY
11.4 CASE STUDY
11.5 CHAPTER SUMMARY
11.6 BIBLIOGRAPHIC NOTE
EXERCISES

CHAPTER 12 CATEGORICAL DATA

12.1 PREVIEW
12.2 INTRODUCTION
12.3 EXPERIMENTAL SETUPS AND EXAMPLES
12.4 COHEN’S KAPPA COEFFICIENT FOR DICHOTOMOUS DATA
12.5 KAPPA TYPE MEASURES FOR MORE THAN TWO CATEGORIES
12.6 CASE STUDIES
12.7 MODELS FOR EXPLORING AGREEMENT
12.8 DISCUSSION
12.9 CHAPTER SUMMARY
12.10 BIBLIOGRAPHIC NOTE
EXERCISES

REFERENCES

DATASET LIST

INDEX

EULA

List of Tables

Chapter 1

Table 1.1 Interpretation of test theory terms in the context of method comparison studies under the measurement error model (1.6).
Table 1.2 Plasma volume measurements expressed as a percentage of normal values due to Hurley and Nadler (data originally provided by C. Doré, see Cotes et al., 1986).
Table 1.3 Potato weights (grams) data for Exercise 1.6.
Table 1.4 IPI angle (^◦) data for Exercise 1.8.
Table 1.5 Fat content (g/100ml) data for Exercise 1.9.

Chapter 2

Table 2.1 Oxygen consumption (ml/kg/min) data for Exercise 2.3.

Chapter 4

Table 4.1 Summary of estimates of bivariate normal model parameters and measures of similarity and agreement for oxygen saturation data. Lower bound for CCC and upper bound for TDI are presented. Methods 1 and 2 refer to OSM and pulse, respectively.
Table 4.2 Summary of estimates of model parameters and measures of similarity and agreement for log-scale plasma volume data. Lower bound for CCC and upper bound for TDI are presented. Methods 1 and 2 refer to Hurley and Nadler methods, respectively.
Table 4.3 Summary of estimates of bivariate normal model parameters and measures of similarity and agreement for vitamin D data. Lower bound for CCC and upper bound for TDI are presented.
Table 4.4 Cardiac output (l/min) data for Exercise 4.8.

Chapter 5

Table 5.1 Summary of estimates of model parameters and measures of similarity, repeatability, and agreement for kiwi data. Lower bounds for CCC and upper bounds for TDI are presented. Methods 1 and 2 refer to micrometer and microscope, respectively.
Table 5.2 Summary of estimates of model parameters and measures of similarity, repeatability, and agreement for the oximetry data. Lower bounds for CCC and upper bounds for TDI are presented. Methods 1 and 2 refer to pulse oximetry and CO-oximetry, respectively.
Table 5.3 Kiwi data consisting of eggshell thickness measurements (in μm). They are provided by P. Cassey, see Igic et al. (2010).
Table 5.4 Knee joint angle (in degrees) data for Exercise 5.8.
Table 5.5 Cardiac ejection fraction (in %) data for Exercise 5.9 (data provided by L. S. Bowling, see Bowling et al., 1993).
Table 5.6 Peak expiratory flow rate (in l/min) data for Exercise 5.10.
Table 5.7 Coronary artery calcium score data for Exercise 5.11.

Chapter 6

Table 6.1 Summary of parameter estimates for cholesterol data. Methods 1 and 2 refer to Cobas Bio and Ektachem, respectively.
Table 6.2 Summary of parameter estimates for cyclosporin data. Methods 1 and 2 refer to HPLC and RIA, respectively.

Chapter 7

Table 7.1 Summary of parameter estimates for systolic blood pressure data. Methods 1, 2, 3, and 4 refer to the observers MS1, MS2, MS3, and DS, respectively.
Table 7.2 Estimates and 95% simultaneous confidence intervals for all-pairwise bias differences and precision ratios for systolic blood pressure data. Methods 1, 2, 3, and 4 refer to the observers MS1, MS2, MS3, and DS, respectively.
Table 7.3 Estimates and one-sided 95% simultaneous confidence bounds for all-pairwise CCCs and TDIs (with p = 0.90) for systolic blood pressure data. Methods 1, 2, 3, and 4 refer to the observers MS1, MS2, MS3, and DS, respectively.
Table 7.4 Summary of parameter estimates for tumor size data.
Table 7.5 Estimates and 95% simultaneous confidence intervals for all-pairwise bias differences and modified precision ratios for tumor size data.
Table 7.6 Estimates and one-sided 95% individual confidence bounds for repeatability versions of CCC and TDI (0.90) for tumor size data.
Table 7.7 Estimates and 95% one-sided simultaneous confidence bounds for all-pairwise values of CCC and TDI (0.90) for tumor size data.
Table 7.8 Fractional area change measurements (in %) for Exercise 7.9.

Chapter 8

Table 8.1 Summary of estimates of parameters of Models A (with variance covariate) and B (with mean and variance covariates) for blood pressure data. Methods 1 and 2 refer to mercury sphygmomanometer and automatic monitor, respectively.

Chapter 9

Table 9.1 Summary of estimates of model parameters for percentage body fat data. Methods 1 and 2 refer to caliper and DEXA, respectively.

Chapter 10

Table 10.1 Nonparametric and parametric estimates for measures of similarity and agreement for unreplicated blood pressure data. The parametric estimates are based on the bivariate normal model (4.6). Lower bounds for CCC and upper bounds for TDI are presented. Methods 1 and 2 refer to arm and finger methods, respectively.
Table 10.2 Estimates and two-sided 95% simultaneous confidence intervals for all-pairwise mean differences and variance ratios for replicated blood pressure data. Methods 1, 2, and 3 refer to observers J and R and the monitor, respectively.
Table 10.3 Estimates and one-sided 95% simultaneous confidence bounds for all-pairwise CCCs and TDIs (with p = 0.90) for replicated blood pressure data. Methods 1, 2, and 3 refer to observers J and R and the monitor, respectively.
Table 10.4 Crab claws data consisting of lengths of crab claws (in mm) for Exercise 10.16. They are provided by P. Cassey.

Chapter 11

Table 11.1 Expected standard errors for estimators of various measures.

Chapter 12

Table 12.1 Observed conclusions of randomized clinical trials and preceding meta analyses.
Table 12.2 Summary of observed responses of physicians and nurses on medical and surgical cases ﬂagged by the CSP system.
Table 12.3 Diagnosis by two neurologists of patients on the likelihood of multiple sclerosis.
Table 12.4 Summary of observed responses of raters 1 and 2.
Table 12.5 Fitness-to-drive evaluation data for Exercise 12.13.
Table 12.6 Frequencies of observed classifications for disease classification data for Exercise 12.15.
Table 12.7 Frequencies of observed evaluations of six psychiatrists for 30 patients for Exercise 12.17.

List of Illustrations

Chapter 1

Figure 1.1 Scatterplots of simulated paired measurements data with high correlations superimposed with the line of equality. (a) The methods have perfect agreement. (b) The methods have unequal means but equal variances. (c) The methods have unequal variances but equal means. (d) The methods have unequal means and variances.
Figure 1.2 Scatterplots (panels (a) and (c)) and Bland-Altman plots (panels (b) and (d)) for two simulated datasets. The line of equality and the zero line are, respectively, superimposed on the two plots.
Figure 1.3 Plots for oxygen saturation data. Panel (a): Scatterplot with line of equality. Panel (b): Bland-Altman plot with zero line.
Figure 1.4 Scatterplots and Bland-Altman plots for plasma volume data. Panels (a) and (b) show measurements on original scale (%); panels (c) and (d) show log-scale measurements. The line of equality and the zero line are, respectively, superimposed on the two plots.
Figure 1.5 Scatterplots and Bland-Altman plots for vitamin D data. Panels (a) and (b) show measurements on original scale (ng/mL); panels (c) and (d) show log-scale measurements. The line of equality and the zero line are, respectively, superimposed on the two plots.
Figure 1.6 Variations of the usual Bland-Altman plot. Panel (a): Plot of ratio versus average for plasma volume data. Panel (b): Plot of relative difference versus average for vitamin D data. The horizontal lines in the plots, respectively, mark the points 1 and 0.
Figure 1.7 A scatterplot of simulated data superimposed with the line of equality, around which the data are truly scattered, and the true regression line of Y₂ on Y₁.
Figure 1.8 Vertical and perpendicular distances of a data point from a line. The former is used in ordinary least squares whereas the latter is used in orthogonal least squares.
Figure 1.9 Deming regression line and the two ordinary least squares regression lines for the data displayed in Figure 1.7.
Figure 1.10 Trellis plot of log-scale plasma volume data. The subjects are sorted according to their average measurement.

Chapter 4

Figure 4.1 Trellis plot of oxygen saturation data.
Figure 4.2 Residual plot for logscale plasma volume data.
Figure 4.3 Trellis plot of log-scale vitamin D data.

Chapter 5

Figure 5.1 Trellis plot of kiwi data.
Figure 5.2 Trellis plot of oximetry data.
Figure 5.3 Plots for oximetry data. Top panel (left to right): Scatterplot with line of equality and Bland-Altman plot with zero line, with subject ID (1 to 61) as plotting symbol. Bottom panel (left to right): Same as top panel but with a common plotting symbol and points from the same subject joined by a broken line.
Figure 5.4 Plots for kiwi eggshell thickness data. Top panel (left to right): Scatterplot with line of equality and Bland-Altman plot with zero line based on 16 randomly formed measurement pairs. Bottom panel (left to right): Same as top panel but based on 16 average measurements.
Figure 5.5 Interaction plot for kiwi data depicting subject × method interaction. Lines join points from the same subject.
Figure 5.6 Interaction plots for oximetry data depicting subject × method interaction (left panel) and subject × time interaction (right panel). Lines join points from the same subject.
Figure 5.7 Residual plot for kiwi data.
Figure 5.8 Plots of residuals (top panel) and their absolute values (bottom panel) against fitted values for each separate method in oximetry data. A nonparametric smooth is added to the bottom plots.

Chapter 6

Figure 6.1 Trellis plot of cyclosporin data.
Figure 6.2 Plots for cyclosporin data. Panel (a): Bland-Altman plot with zero line. Panel (b): Plot of absolute values of centered differences against averages.
Figure 6.3 Trellis plot of cholesterol data.
Figure 6.4 Plots for cholesterol data. Top panel (left to right): Scatterplot with line of equality and Bland-Altman plot with zero line based on 100 randomly formed measurement pairs. Bottom panel (left to right): Same as top panel but based on average measurements.
Figure 6.5 Plots of standardized residuals (top panel) and their absolute values (bottom panel) against fitted values from a homoscedastic fit to cholesterol data.
Figure 6.6 Plots of log of absolute residuals from a homoscedastic fit to cholesterol data against log() (top panel) and (bottom panel) with as the average cholesterol level of a subject. A simple linear regression fit is superimposed on each plot.
Figure 6.7 Residual plots from a heteroscedastic fit to cholesterol data using power variance function models.
Figure 6.8 Observed versus fitted within-subject standard deviations from power (solid line) and exponential (broken curve) variance function models for cholesterol data. The covariate , the subject average, is plotted on the horizontal axis. The observed values are represented by the points and the fitted values are represented by the curves.
Figure 6.9 95% limits of inter- and intra-method agreement for cholesterol data as a function of magnitude of measurement. The inter-method limits, based on the distribution of D, are centered at 5.58. The intra-method limits, based on the distributions of D₁ and D₂, are centered at zero.
Figure 6.10 Plots for cholesterol data. (a) Estimate (solid curve) and 95% pointwise two-sided confidence band (broken curves) for precision ratio; (b) 95% lower confidence bands for inter- and intra-method versions of CCC; (c) 95% upper confidence bands for inter- and intra-method versions of TDI (0.90) as well as their reflections around the horizontal line at zero, giving the corresponding pointwise tolerance bands; and (d) same as panel (c) but with Ektachem recalibrated to have the same estimated mean as Cobas Bio.
Figure 6.11 A scatterplot of cyclosporin data with line of equality.
Figure 6.12 Plots of absolute standardized differences against averages for cyclosporin data with exponential and power variance function fits. A horizontal line at is superimposed on each plot.
Figure 6.13 Plot of absolute centered differences against averages for cyclosporin data superimposed with times fitted standard deviations of differences under power (solid line) and exponential (broken curve) variance function fits.
Figure 6.14 Estimate (solid curve) of the variance ratio for RIA over HPLC and its 95% pointwise confidence band (broken curves) for cyclosporin data.
Figure 6.15 95% pointwise bounds for cyclosporin data with (solid curve) and without (broken curve) the outlier. Panel (a): Lower confidence bounds for CCC. Panel (b): Upper confidence bounds for TDI and their reflections around the horizontal line at zero, giving pointwise tolerance bands.

Chapter 7

Figure 7.1 Trellis plot of systolic blood pressure data. The symbols for the four observers are given at the top of the plot.
Figure 7.2 Side-by-side boxplots for systolic blood pressure data.
Figure 7.3 A matrix of scatterplots with line of equality (below the diagonal) and Bland-Altman plots with zero line (above the diagonal) for systolic blood pressure data. The measurements range from 82 to 236 mm Hg and their differences range from −16 to 30 mm Hg.
Figure 7.4 Trellis plot of tumor size data. The symbols for the five readers are given at the top of the plot.
Figure 7.5 Side-by-side boxplots for tumor size data.
Figure 7.6 Interaction plot for tumor size data depicting lesion × reader interaction.
Figure 7.7 A matrix of scatterplots with line of equality (below the diagonal) and BlandAltman plots with zero line (above the diagonal) for tumor size data. One measurement from each reader on every subject is randomly selected for this plot. The measurements range from 1 to 9 cm and their differences range from 3 to 4 cm.
Figure 7.8 Residual plot for systolic blood pressure data.
Figure 7.9 Residual plot for tumor size data.

Chapter 8

Figure 8.1 Trellis plots of blood pressure data by gender.
Figure 8.2 Side-by-side boxplots for blood pressure data by gender.
Figure 8.3 Scatterplots of blood pressure against age (left panel) and against heart rate (right panel).
Figure 8.4 Plots for blood pressure data. Top panel (left to right): Scatterplot with line of equality and Bland-Altman plot with zero line based on randomly formed measurement pairs. Bottom panel (left to right): Same as top panel but based on average measurements.
Figure 8.5 Plots of log of absolute residuals from a homoscedastic fit to blood pressure data against log() with as subject average blood pressure. A simple linear regression fit is superimposed on each plot.
Figure 8.6 Residual plots from a heteroscedastic fit to blood pressure data using power variance function models.
Figure 8.7 Estimates and two-sided 95% simultaneous confidence bands for the precision ratio λ12 under Models A and B.
Figure 8.8 Figure 8.8 One-sided 95% simultaneous confidence bands for TDI(0.90) and CCC—lower bands for CCC and upper bands for TDI—and their intra-method versions for mercury (method 1) and automatic (method 2) methods.

Chapter 9

Figure 9.1 Trajectories of percentage body fat measurements for 112 girls. Lines connect the available time points from the same subject. The gray curves in the middle are the estimated mean functions.
Figure 9.2 Trajectories of DEXA minus caliper differences in percentage body fat measurements for 91 girls with complete measurement pairs. Lines connect the available time points from the same subject. The gray curve in the middle is the estimated mean difference function.
Figure 9.3 Side-by-side boxplots of DEXA minus caliper differences in percentage body fat measurements for visit numbers two through nine, with a reference line at zero.
Figure 9.4 Scatterplots of percentage body fat measurements for visit numbers two (bottom left panel) through nine (top right panel). The line of equality is superimposed on each plot.
Figure 9.5 Bland-Altman plots of percentage body fat measurements for visit numbers two (bottom left panel) through nine (top right panel) for available pairs. A horizontal line at zero is superimposed on each plot.
Figure 9.6 Estimated semivariogram functions for caliper and DEXA methods computed using standardized residuals from a model fit to percentage body fat data with independent within-subject errors. A nonparametric smooth curve is added to each plot to show the underlying trend.
Figure 9.7 Sample autocorrelation functions for normalized residuals of caliper and DEXA methods under a model fit to percentage body fat data with AR(1) errors. The dashed curves represent the 95% bounds (9.19).
Figure 9.8 Estimate of mean difference in percentage body fat using caliper and DEXA methods (solid curve), its 95% two-sided simultaneous confidence band (shaded region), and the 95% limits of agreement (broken curves).
Figure 9.9 Estimates of CCC and TDI functions (solid curves) and their 95% onesided simultaneous confidence bands (shaded regions) for percentage body fat data.

Chapter 10

Figure 10.1 Trellis plot of unreplicated blood pressure data.
Figure 10.2 A scatterplot with line of equality (left panel) and a Bland-Altman plot with zero line (right panel) for unreplicated blood pressure data.
Figure 10.3 Side-by-side boxplots for unreplicated blood pressure data.
Figure 10.4 Trellis plot of replicated blood pressure data.
Figure 10.5 A matrix of scatterplots of systolic blood pressures with line of equality (below the diagonal) and Bland-Altman plots with zero line (above the diagonal) for replicated blood pressure data. One measurement per method from each of the 85 subjects is randomly selected for this plot. The measurements range from 76 to 227 mm Hg and their differences range from −25 to 111 mm Hg.
Figure 10.6 Side-by-side box plots for all measurements of replicated blood pressure data.

Chapter 11

Figure 11.1 Expected standard errors for estimators of log{TDI(0.90)} (left panel) and z(CCC) (right panel) as functions of n for a paired measurements design.

Chapter 12

Figure 12.1 Bangdiwala agreement chart for the MS data in Table 12.3.
Figure 12.2 Range of κ values for a given probability of agreement θ. Given θ = 0.8, κ cannot exceed 0.6; even when θ is as high as 0.90, κ cannot exceed 0.8, and can be negative.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

The rights of Pankaj K. Choudhary and Haikady N. Nagaraja to be identified as the authors of this work has been asserted in accordance with law.

Registered Office
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA

Editorial Office
111 River Street, Hoboken, NJ 07030, USA

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content that appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of Warranty
While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

Library of Congress Cataloging-in-Publication Data
Names: Choudhary, Pankaj K. (Pankaj Kumar), 1975-author. | Nagaraja, H. N. (Haikady Navada), 1954-author.
Title: Measuring agreement : models, methods, and applications / by Pankaj K. Choudhary, Haikady N. Nagaraja.
Description: Hoboken, NJ : John Wiley & Sons, 2017. | Series: Wiley series in probability and statistics | Includes bibliographical references and index. | Description based on print version record and CIP data provided by publisher; resource not viewed.
Identifiers: LCCN 2017022085 (print) | LCCN 2017037255 (ebook) | ISBN 9781118553145 (pdf) | ISBN 9781118553244 (epub) | ISBN 9781118078587 (cloth)
Subjects: LCSH: Statistics--Methodology.
Classification: LCC QA276.A2 (ebook) | LCC QA276.A2 C46 2017 (print) | DDC 001.4/22--dc23
LC record available at https://lccn.loc.gov/2017022085

Cover images: (Background) © BlackJack3D/Gettyimages; (Graph)
Courtesy of Pankaj K. Choudhary and Haikady N. Nagaraja Cover design by Wiley

To: My parents, and Swati, Aalo, and Arushi—PK C Jyothi—HNN

Preface

This book presents statistical models and methods for analyzing common types of data collected in method comparison experiments and illustrates their application through detailed case studies. The main aim of these trials is to evaluate agreement between two or more methods of measurement. Although such studies are particularly abundant in health-related fields, they are also conducted in other disciplines, including metrology, ecology, and social and behavioral sciences.

Currently, at least six books cover the topic of agreement evaluation, including von Eye and Mun (2004), Carstensen (2010), Dunn (2004), Shoukri (2010), Broemeling (2009), and Lin et al. (2011). Of these, the first focuses exclusively on categorical data, and the second on continuous data. Others consider both types of data with varying levels of depth and choice of topics. Our book also considers both but with a primary focus on continuous data and one chapter devoted to categorical data. By providing chapter-length treatments of the common types of continuous data, it offers a comprehensive coverage of the topic, and its scope is broader than any other book currently available. It, however, by no means offers a complete survey of the literature. For example, measurement error models, Bayesian methods, and approaches based on generalized estimating equations are not included.

Essentially two principles guided us while writing this book. The first was to view the analysis of method comparison data as a two-step procedure where, in step 1, an adequate model for the data is found, and in step 2, inferential techniques are applied for appropriate functions of the parameters of the model found in step 1. For modeling of data, we primarily rely on mixed-effects models because they capture dependence in a subject’s measurements in an intuitively appealing manner by means of random subject effects; and they also offer a unified framework for dealing with a variety of data types. Besides, they can be fit by the maximum likelihood method using any commonly available statistical software package. For inference, we use the standard large-sample theory and invoke a bootstrap approach whenever the sample size seems too small for the asymptotic methods to be accurate. The second principle was to strive to make the presentation accessible to a wide audience while at the same time making the book theoretically rigorous and self-contained with necessary technical details and references. We have attempted to strike this balance by separating the technical details from the methodological descriptions, forgoing the references in favor of a bibliographic note at the end of each chapter, and by presenting detailed analyses of several real datasets.

The book is organized into twelve chapters. The first eleven are concerned with continuous data while the last covers categorical data. Chapter 1 provides a general introduction to studies comparing two measurement methods and discusses key concepts and statistical issues and tools involved in their analysis. Chapter 2 introduces various measures of agreement for continuous data. Chapter 3 describes mixed-effects models in general and presents the large-sample approach for inference. It provides the technical foundation for the rest of the book and can be skipped by a reader interested in applications. Chapters 4 through 9 consider continuous data collected from various types of experiments, with study designs becoming increasingly more complex. In order, these chapters are devoted to designs with paired measurements, repeated measurements, heteroscedastic measurements, more than two methods, covariates, and longitudinal data. Chapter 10 presents a nonparametric approach for data that do not satisfy assumptions of a mixed-effects model. Chapter 11 considers sample size determination for designing a method comparison study with continuous data. Chapter 12 takes up the question of agreement with categorical data.

Even though the presentation is self-contained, some statistical background is expected from the readers. Familiarity with basic statistical concepts such as maximum likelihood estimation, hypothesis testing, confidence intervals, correlation, and linear regression is necessary. A prior introduction to mixed-effects models and linear algebra will enhance the understanding of the technical details.

The free statistical software R (R Core Team, 2015) has been used to perform all the computations and to generate all the graphics presented in this book. However, the R code is not presented. Much of the code and many of the datasets used here are publicly available at the companion website: http://www.utdallas.edu/~pankaj/agreement_book/

Some familiarity with R programming is assumed for following the code and understanding the output produced. In addition to the base and graphics packages of R, the the following packages and their dependencies have been used in preparing this book: lattice (Sarkar, 2008), latticeExtra (Sarkar and Andrews, 2013), Matrix (Bates and Maechler, 2015), mvtnorm (Genz et al., 2015), multcomp (Hothorn et al., 2008), nlme (Pinheiro et al., 2015), numDeriv (Gilbert and Varadhan, 2015), tikzDevice (Sharpsteen and Bracken, 2015), and xtable (Dahl, 2016).

The book is targeted primarily towards two groups of researchers. The first consists of biomedical and social and behavioral scientists interested in the development and validation of measurement methods. The second includes statisticians engaged in the design and analysis of method comparison studies and in the development of associated statistical methodologies. It can also serve as a textbook for a semester-long special topics course at the graduate level. With that purpose, we have incorporated numerous theoretical and data-centric exercises at the end of the chapters that expand on the material covered in the main body. These exercises provide practice for mastering methodological details and applying the results.

We appreciate the support from our institutions as we marched through this project and for their outstanding library and computing facilities. We thank all those scientists whose dedicated research we were able to highlight in this work. We thank our long-time friends and colleagues for their advice and encouragement, including Professors Babis Papachristou (Rowan University), Michael Baron (American University), Vladimir Dragovic and Vish Ramakrishna (UT Dallas), and Tom Santner and Doug Wolfe (Ohio State University). We thank Professor Phill Cassey (University of Adelaide) for introducing us to applications in ecology and providing datasets, and Professor Chaitra Nagaraja (Fordham University) for producing the plots in Chapter 12. We also thank Professors Huiman Barnhart (Duke University), Douglas Hawkins (University of Minnesota), Vernon Chinchilli (Pennsylvania State University), and Michael Haber (Emory University) for sharing their datasets.

We are grateful to Professors Mohamed Shoukri (King Faisal Specialist Hospital and Research Centre) and Tony Ng (Southern Methodist University) for reading an earlier draft of the manuscript and providing valuable comments. We thank Susanne Steitz-Filler, Allison McGinniss, and Melissa Yanuzzi from John Wiley for guiding the project from start to finish and for their patience and perseverance. We invite the input of our readers on the coverage and presentation here as well as on the companion website as there is always room for improvement.

This book would not have been possible without the support of our family members. They gracefully sacrificed their time with us to allow us to work on a project that seemed to take forever. We take this opportunity to thank them all from the bottom of our hearts.

P. K. Choudhary & H. N. Nagaraja

Richardson, Texas

Columbus, Ohio

July, 2017

Models, Methods, and Applications

Preface