Cover Page

Stochastic Models in Survival Analysis and Reliability Set

coordinated by
Catherine Huber-Carol and Mikhail Nikulin

Volume 3

Chi-squared Goodness-of-fit Tests for Censored Data

Mikhail S. Nikulin

Ekaterina V. Chimitova

image

Introduction

Chi-squared testing is one of the most commonly applied statistical techniques. The chi-squared test and its modifications are adopted for testing goodness-of-fit of various probabilistic models and provide reliable answers for researchers in engineering, quality control, finance, medicine, biology and other fields.

This book is devoted to the problems of construction and application of chi-squared goodness-of-fit tests for complete and censored data. We give examples of chi-squared tests for various distributions widely used in practice, and also consider chi-squared tests for parametric proportional hazards models and accelerated failure time models, which are widely used in reliability and survival analysis. Special attention is paid to the choice of grouping intervals and simulations.

Unfortunately, in many books on statistical data analysis, chi-squared tests are applied incorrectly or inefficiently. Classical chi-squared tests assume that unknown distribution parameters are estimated using grouped data, but in practice this assumption is often forgotten. In this book, we consider the modified chi-squared tests, which do not suffer from such drawback. Moreover, we describe the chi-squared goodness-of-fit tests for censored data with time-dependent covariates. Hence, the aim of this book is to demonstrate the application of the modified chi-squared tests in a wide variety of specific situations.

In the complete data case, a well-known modification of the classical chi-squared tests is the Nikulin-Rao-Robson statistic, which is based on the differences between two estimators of the probabilities to fall into grouping intervals: one estimator is based on the empirical distribution function, the other on the maximum likelihood estimators of unknown parameters of the tested model using initial non-grouped data [NIK 73b], [NIK 73a], [NIK 73c], [DZH 74], [KAP 58], [DZH 82], [DZH 83], [RAO 74], [VOI 93a], [VOI 96], [GRE 96], [DRO 88], [DRO 89], [VAN 98], [KEC 02], [VOI 13].

Li and Doss [LI 93] consider the modification of these results using wider classes of estimators from initial data. Goodness-of-fit tests for linear regression have been studied in [MUK 77], [PIE 79], [LOY 80], [KOU 84], [ZAC 71] and [VOI 07].

Habib and Thomas [HAB 86] and Hollander and Peña [HOL 92] considered natural modifications of the Nikulin-Rao-Robson statistic in the case of censored data without covariates. These tests are also based on the differences between two estimators of the probabilities to fall into grouping intervals. The first estimator is based on the Kaplan-Meier estimator of the cumulative distribution function, the second estimator is based on the maximum likelihood estimators of unknown parameters of the tested model using initial non-grouped censored data. Nikulin and Solev [NIK 99] considered the Pearson-type chi-squared goodness-of-fit test for doubly censored data.

The idea of comparing observed and expected numbers of failures in time intervals is due to Akritas [AKR 88] and was further developed by Hjort [HJO 90]. In the censored data case, Hjort [HJO 90], Royston & Lambert [ROY 11] and Pena [PEN 98a] considered goodness-of-fit for parametric Cox models, Gray and Pierce [GRA 85], Akritas and Torbeyns [AKR 97] and Zhang [ZHA 99] for linear regression models. Pena [PEN 98b] considered smooth goodness-of-fit tests for composite hypothesis in hazard-based models with possible covariates.

This book presents the recent innovations in the area as well as the important results previously only published in Russian. Different approaches to the choice of grouping intervals are considered. The chi-squared tests are compared with other goodness-of-fit tests (such as the Kolmogorov, Cramer-von Mises-Smirnov, Anderson-Darling and Zhang tests) in terms of power, when testing close competing hypotheses. The considerable contribution to the development of this direction belongs to Lemeshko (see, for example, [LEM 98], [LEM 00], [LEM 15b], [LEM 09c] and [LEM 10b]).

The book is intended for researchers interested in statistical testing on the basis of censored and complete data, as well as for university teachers and students. The book is also useful for representatives of industry and finance, in survival analysis and reliability, who deal with data analysis.