Details

Categorical Data Analysis by Example

1. Aufl.

von: Graham J. G. Upton
80,99 €
Verlag:	Wiley
Format:	PDF
Veröffentl.:	20.10.2016
ISBN/EAN:	9781119307914
Sprache:	englisch
Anzahl Seiten:	216

In den Warenkorb

Als Gutschein

DRM-geschütztes eBook, Sie benötigen z.B. Adobe Digital Editions und eine Adobe ID zum Lesen.

Beschreibungen

Titelbeschreibung

Introduces the key concepts in the analysis of categoricaldata with illustrative examples and accompanying R code This book is aimed at all those who wish to discover how to analyze categorical data without getting immersed in complicated mathematics and without needing to wade through a large amount of prose. It is aimed at researchers with their own data ready to be analyzed and at students who would like an approachable alternative view of the subject. Each new topic in categorical data analysis is illustrated with an example that readers can apply to their own sets of data. In many cases, R code is given and excerpts from the resulting output are presented. In the context of log-linear models for cross-tabulations, two specialties of the house have been included: the use of cobweb diagrams to get visual information concerning significant interactions, and a procedure for detecting outlier category combinations. The R code used for these is available and may be freely adapted. In addition, this book: <ul> <li>Uses an example to illustrate each new topic in categorical data</li> <li>Provides a clear explanation of an important subject</li> <li>Is understandable to most readers with minimal statistical and mathematical backgrounds</li> <li>Contains examples that are accompanied by R code and resulting output</li> <li>Includes starred sections that provide more background details for interested readers</li> </ul> Categorical Data Analysis by Example is a reference for students in statistics and researchers in other disciplines, especially the social sciences, who use categorical data. This book is also a reference for practitioners in market research, medicine, and other fields.

Inhaltsverzeichnis

Preface xi Acknowledgments xiii 1 Introduction 1 1.1 What are categorical data? 1 1.2 A typical data set 2 1.3 Visualisation and crosstabulation 3 1.4 Samples, populations, and random variation 4 1.5 Proportion, probability and conditional probability 5 1.6 Probability distributions 6 1.6.1 The binomial distribution 6 1.6.2 The multinomial distribution 7 1.6.3 The Poisson distribution 7 1.6.4 The normal distribution 7 1.6.5 The chisquared (X2) distribution 8 1.7 *The likelihood 9 2 Estimation and inference for categorical data 11 2.1 Goodness of fit 11 2.1.1 Pearson’s X2 goodness-of-fit statistic 11 2.1.2 * The link between X2 and the Poisson and I2 distributions 12 2.1.3 The likelihood-ratio goodness-of-fit statistic, G2 13 2.1.4 * Why the G2 and X2 statistics usually have similar values 14 2.2 Hypothesis tests for a binomial proportion (large sample) 14 2.2.1 The normal score test 14 2.2.2 * Link to Pearson’s X2 goodness-of-fit test 15 2.2.3 G2 for a binomial proportion 15 2.3 Hypothesis tests for a binomial proportion (small sample) 16 2.3.1 One-tailed hypothesis test 16 2.3.2 Two-tailed hypothesis tests 17 2.4 Interval estimates for a binomial proportion 18 2.4.1 Laplace’s method 18 2.4.2 Wilson’s method 18 2.4.3 The Agresti-Coull method 19 2.4.4 Small samples and exact calculations 19 3 The 2 X 2 contingency table 23 3.1 Introduction 23 3.2 Fisher’s exact test (for independence) 24 3.2.1 * Derivation of the exact test formula 26 3.3 Testing independence with large cell frequencies 27 3.3.1 Using Pearson’s goodness-of-fit test 27 3.3.2 The Yates correction 28 3.4 The 2 X 2 table in a medical context 29 3.5 Measuring lack of independence (comparing proportions) 31 3.5.1 Difference of proportions 31 3.5.2 Relative risk 32 3.5.3 Odds-ratio 33 4 The I x J contingency table 37 4.1 Notation 37 4.2 Independence in the I X J contingency table 38 4.2.1 Estimation and degrees of freedom 38 4.2.2 Odds-ratios and independence 39 4.2.3 Goodness-of-fit and lack of fit of the independence model 39 4.3 Partitioning 42 4.3.1 * Additivity of G2 42 4.3.2 Rules for partitioning 44 4.4 Graphical displays 44 4.4.1 Mosaic plots 45 4.4.2 Cobweb diagrams 45 4.5 Testing independence with ordinal variables 46 5 The exponential family 51 5.1 Introduction 51 5.2 The exponential family 52 5.2.1 The exponential dispersion family 53 5.3 Components of a general linear model 53 5.4 Estimation 54 6 A model taxonomy 57 6.1 Underlying questions 57 6.1.1 Which variables are of interest? 57 6.1.2 What categories should be used? 58 6.1.3 What is the type of each variable? 58 6.1.4 What is the nature of each variable? 58 6.2 Identifying the type of model 58 7 The 2 X J contingency table 61 7.1 A problem with X2 (and G2) 61 7.2 Using the logit 62 7.2.1 Estimation of the logit 63 7.2.2 The null model 64 7.3 Individual data and grouped data 64 7.4 Precision, confidence intervals, and prediction intervals 69 7.4.1 Prediction intervals 70 7.5 Logistic regression with a categorical explanatory variable 70 7.5.1 Parameter estimates with categorical variables (J > 2) 73 7.5.2 The dummy variable representation of a categorical variable 74 8 Logistic regression with several explanatory variables 77 8.1 Degrees of freedom when there are no interactions 77 8.2 Getting a feel for the data 79 8.3 Models with two variable interactions 81 8.3.1 Link to the testing of independence between two variables 83 9 Model selection and diagnostics 85 9.1 Introduction 85 9.1.1 Ockham’s razor 86 9.2 Notation for interactions and for models 87 9.3 Stepwise methods for model selection using G2 89 9.3.1 Forward selection 89 9.3.2 Backward elimination 91 9.3.3 Complete stepwise 93 9.4 AIC and related measures 93 9.5 The problem caused by rare combinations of events 95 9.5.1 Tackling the problem 96 9.6 Simplicity versus accuracy 98 9.7 DFBETAS 100 10 Multinomial logistic regression 103 10.1 A single continuous explanatory variable 103 10.2 Nominal categorical explanatory variables 106 10.3 Models for an ordinal response variable 108 10.3.1 Cumulative logits 108 10.3.2 Proportional odds models 109 10.3.3 Adjacent-category logit models 114 10.3.4 Continuation-ratio logit models 115 11 Log-linear models for I X J tables 119 11.1 The saturated model 119 11.1.1 Cornered constraints 120 11.1.2 Centered constraints 122 11.2 The independence model for an I X J table 125 12 Log-linear models for I X J X K tables 129 12.1 Mutual independence: A=B=C 131 12.2 The model AB=C 131 12.3 Conditional independence and independence 133 12.4 The model AB=AC 134 12.5 The models AB=AC=BC and ABC 135 12.6 Simpson’s paradox 135 12.7 Connection between log-linear models and logistic regression 137 13 Implications and uses of Birch’s result 141 13.1 Birch’s result 141 13.2 Iterative scaling 142 13.3 The hierarchy constraint 143 13.4 Inclusion of the all-factor interaction 144 13.5 Mostellerising 145 14 Model selection for log-linear models 149 14.1 Three variables 150 14.2 More than three variables 153 15 Incomplete tables, dummy variables, and outliers 157 15.1 Incomplete tables 157 15.1.1 Degrees of freedom 158 15.2 Quasi-independence 159 15.3 Dummy variables 159 15.4 Detection of outliers 160 16 Panel data and repeated measures 165 16.1 The mover-stayer model 166 16.2 The loyalty model 168 16.3 Symmetry 169 16.4 Quasi-symmetry 170 16.5 The loyalty-distance model 172 A R code for Cobweb function 175 Index 179 Author Index 183 Index of Examples 185

Rezension

"Concise introduction to dealing with categorical data (with supporting R code) which will help the general data scientist." (Raspberry Pi March 2017)

Autorenportrait

GRAHAM J. G. UPTON is formerly Professor of Applied Statistics, Department of Mathematical Sciences, University of Essex. Dr. Upton is author of The Analysis of Cross-tabulated Data (1978) and joint author of Spatial Data Analysis by Example (2 volumes, 1995), both published by Wiley. He is the lead author of The Oxford Dictionary of Statistics (OUP, 2014). His books have been translated into Japanese, Russian, and Welsh.