Details

Categorical Data Analysis by Example


Categorical Data Analysis by Example


1. Aufl.

von: Graham J. G. Upton

80,99 €

Verlag: Wiley
Format: PDF
Veröffentl.: 20.10.2016
ISBN/EAN: 9781119307914
Sprache: englisch
Anzahl Seiten: 216

DRM-geschütztes eBook, Sie benötigen z.B. Adobe Digital Editions und eine Adobe ID zum Lesen.

Beschreibungen

<p><b>Introduces the key concepts in the analysis of categoricaldata with illustrative examples and accompanying R code</b></p> <p>This book is aimed at all those who wish to discover how to analyze categorical data without getting immersed in complicated mathematics and without needing to wade through a large amount of prose. It is aimed at researchers with their own data ready to be analyzed and at students who would like an approachable alternative view of the subject.</p> <p>Each new topic in categorical data analysis is illustrated with an example that readers can apply to their own sets of data. In many cases, R code is given and excerpts from the resulting output are presented. In the context of log-linear models for cross-tabulations, two specialties of the house have been included: the use of cobweb diagrams to get visual information concerning significant interactions, and a procedure for detecting outlier category combinations. The R code used for these is available and may be freely adapted. In addition, this book:</p> <ul> <li>Uses an example to illustrate each new topic in categorical data</li> <li>Provides a clear explanation of an important subject</li> <li>Is understandable to most readers with minimal statistical and mathematical backgrounds</li> <li>Contains examples that are accompanied by R code and resulting output</li> <li>Includes starred sections that provide more background details for interested readers</li> </ul> <p><i>Categorical Data Analysis by Example </i>is a reference for students in statistics and researchers in other disciplines, especially the social sciences, who use categorical data. This book is also a reference for practitioners in market research, medicine, and other fields.</p>
<p>Preface xi</p> <p>Acknowledgments xiii</p> <p><b>1 Introduction 1</b></p> <p>1.1 What are categorical data? 1</p> <p>1.2 A typical data set 2</p> <p>1.3 Visualisation and crosstabulation 3</p> <p>1.4 Samples, populations, and random variation 4</p> <p>1.5 Proportion, probability and conditional probability 5</p> <p>1.6 Probability distributions 6</p> <p>1.6.1 The binomial distribution 6</p> <p>1.6.2 The multinomial distribution 7</p> <p>1.6.3 The Poisson distribution 7</p> <p>1.6.4 The normal distribution 7</p> <p>1.6.5 The chisquared (<i>X</i><sup>2</sup>) distribution 8</p> <p>1.7 *The likelihood 9</p> <p><b>2 Estimation and inference for categorical data 11</b></p> <p>2.1 Goodness of fit 11</p> <p>2.1.1 Pearson’s<i> X</i><sup>2</sup> goodness-of-fit statistic 11</p> <p>2.1.2 * The link between <i>X</i><sup>2</sup> and the Poisson and ­<sup>I2 </sup>distributions 12</p> <p>2.1.3 The likelihood-ratio goodness-of-fit statistic, <i>G</i><sup>2</sup> 13</p> <p>2.1.4 * Why the <i>G</i><sup>2</sup> and <i>X</i><sup>2</sup> statistics usually have similar values 14</p> <p>2.2 Hypothesis tests for a binomial proportion (large sample) 14</p> <p>2.2.1 The normal score test 14</p> <p>2.2.2 * Link to Pearson’s <i>X</i><sup>2</sup> goodness-of-fit test 15</p> <p>2.2.3 G2 for a binomial proportion 15</p> <p>2.3 Hypothesis tests for a binomial proportion (small sample) 16</p> <p>2.3.1 One-tailed hypothesis test 16</p> <p>2.3.2 Two-tailed hypothesis tests 17</p> <p>2.4 Interval estimates for a binomial proportion 18</p> <p>2.4.1 Laplace’s method 18</p> <p>2.4.2 Wilson’s method 18</p> <p>2.4.3 The Agresti-Coull method 19</p> <p>2.4.4 Small samples and exact calculations 19</p> <p><b>3 The 2 X 2 contingency table 23</b></p> <p>3.1 Introduction 23</p> <p>3.2 Fisher’s exact test (for independence) 24</p> <p>3.2.1 * Derivation of the exact test formula 26</p> <p>3.3 Testing independence with large cell frequencies 27</p> <p>3.3.1 Using Pearson’s goodness-of-fit test 27</p> <p>3.3.2 The Yates correction 28</p> <p>3.4 The 2 X 2 table in a medical context 29</p> <p>3.5 Measuring lack of independence (comparing proportions) 31</p> <p>3.5.1 Difference of proportions 31</p> <p>3.5.2 Relative risk 32</p> <p>3.5.3 Odds-ratio 33</p> <p><b>4 The<i> I</i> x <i>J</i> contingency table 37</b></p> <p>4.1 Notation 37</p> <p>4.2 Independence in the <i>I</i> X <i>J</i> contingency table 38</p> <p>4.2.1 Estimation and degrees of freedom 38</p> <p>4.2.2 Odds-ratios and independence 39</p> <p>4.2.3 Goodness-of-fit and lack of fit of the independence model 39</p> <p>4.3 Partitioning 42</p> <p>4.3.1 * Additivity of G<sup>2</sup> 42</p> <p>4.3.2 Rules for partitioning 44</p> <p>4.4 Graphical displays 44</p> <p>4.4.1 Mosaic plots 45</p> <p>4.4.2 Cobweb diagrams 45</p> <p>4.5 Testing independence with ordinal variables 46</p> <p><b>5 The exponential family 51</b></p> <p>5.1 Introduction 51</p> <p>5.2 The exponential family 52</p> <p>5.2.1 The exponential dispersion family 53</p> <p>5.3 Components of a general linear model 53</p> <p>5.4 Estimation 54</p> <p><b>6 A model taxonomy 57</b></p> <p>6.1 Underlying questions 57</p> <p>6.1.1 Which variables are of interest? 57</p> <p>6.1.2 What categories should be used? 58</p> <p>6.1.3 What is the type of each variable? 58</p> <p>6.1.4 What is the nature of each variable? 58</p> <p>6.2 Identifying the type of model 58</p> <p>7 The <i>2</i> X <i>J</i> contingency table 61</p> <p>7.1 A problem with <i>X</i><sup>2</sup> (and G2) 61</p> <p>7.2 Using the logit 62</p> <p>7.2.1 Estimation of the logit 63</p> <p>7.2.2 The null model 64</p> <p>7.3 Individual data and grouped data 64</p> <p>7.4 Precision, confidence intervals, and prediction intervals 69</p> <p>7.4.1 Prediction intervals 70</p> <p>7.5 Logistic regression with a categorical explanatory variable 70</p> <p>7.5.1 Parameter estimates with categorical variables (<i>J</i> > <i>2</i>) 73</p> <p>7.5.2 The dummy variable representation of a categorical variable 74</p> <p>8 Logistic regression with several explanatory variables 77</p> <p>8.1 Degrees of freedom when there are no interactions 77</p> <p>8.2 Getting a feel for the data 79</p> <p>8.3 Models with two variable interactions 81</p> <p>8.3.1 Link to the testing of independence between two variables 83</p> <p><b>9 Model selection and diagnostics 85</b></p> <p>9.1 Introduction 85</p> <p>9.1.1 Ockham’s razor 86</p> <p>9.2 Notation for interactions and for models 87</p> <p>9.3 Stepwise methods for model selection using G2 89</p> <p>9.3.1 Forward selection 89</p> <p>9.3.2 Backward elimination 91</p> <p>9.3.3 Complete stepwise 93</p> <p>9.4 AIC and related measures 93</p> <p>9.5 The problem caused by rare combinations of events 95</p> <p>9.5.1 Tackling the problem 96</p> <p>9.6 Simplicity versus accuracy 98</p> <p>9.7 DFBETAS 100</p> <p><b>10 Multinomial logistic regression 103</b></p> <p>10.1 A single continuous explanatory variable 103</p> <p>10.2 Nominal categorical explanatory variables 106</p> <p>10.3 Models for an ordinal response variable 108</p> <p>10.3.1 Cumulative logits 108</p> <p>10.3.2 Proportional odds models 109</p> <p>10.3.3 Adjacent-category logit models 114</p> <p>10.3.4 Continuation-ratio logit models 115</p> <p><b>11 Log-linear models for <i>I</i> X <i>J</i> tables 119</b></p> <p>11.1 The saturated model 119</p> <p>11.1.1 Cornered constraints 120</p> <p>11.1.2 Centered constraints 122</p> <p>11.2 The independence model for an <i>I</i> <i>X J</i> table 125</p> <p><b>12 Log-linear models for <i>I</i> X <i>J</i> X <i>K</i> tables 129</b></p> <p>12.1 Mutual independence: A=B=C 131</p> <p>12.2 The model AB=C 131</p> <p>12.3 Conditional independence and independence 133</p> <p>12.4 The model AB=AC 134</p> <p>12.5 The models AB=AC=BC and ABC 135</p> <p>12.6 Simpson’s paradox 135</p> <p>12.7 Connection between log-linear models and logistic regression 137</p> <p><b>13 Implications and uses of Birch’s result 141</b></p> <p>13.1 Birch’s result 141</p> <p>13.2 Iterative scaling 142</p> <p>13.3 The hierarchy constraint 143</p> <p>13.4 Inclusion of the all-factor interaction 144</p> <p>13.5 Mostellerising 145</p> <p><b>14 Model selection for log-linear models 149</b></p> <p>14.1 Three variables 150</p> <p>14.2 More than three variables 153</p> <p><b>15 Incomplete tables, dummy variables, and outliers 157</b></p> <p>15.1 Incomplete tables 157</p> <p>15.1.1 Degrees of freedom 158</p> <p>15.2 Quasi-independence 159</p> <p>15.3 Dummy variables 159</p> <p>15.4 Detection of outliers 160</p> <p><b>16 Panel data and repeated measures 165</b></p> <p>16.1 The mover-stayer model 166</p> <p>16.2 The loyalty model 168</p> <p>16.3 Symmetry 169</p> <p>16.4 Quasi-symmetry 170</p> <p>16.5 The loyalty-distance model 172</p> <p>A R code for Cobweb function 175</p> <p>Index 179</p> <p>Author Index 183</p> <p>Index of Examples 185</p>
<p>"Concise introduction to dealing with categorical data (with supporting R code) which will help the general data scientist." (<b>Raspberry Pi<b> March 2017)
<p><b>GRAHAM J. G. UPTON</b> is formerly Professor of Applied Statistics, Department of Mathematical Sciences, University of Essex. Dr. Upton is author of <i>The Analysis</i> <i>of Cross-tabulated Data </i>(1978) and joint author of <i>Spatial Data Analysis by Example</i> (2 volumes, 1995), both published by Wiley. He is the lead author of <i>The Oxford</i> <i>Dictionary of Statistics </i>(OUP, 2014). His books have been translated into Japanese, Russian, and Welsh.</p>
<p><b>Introduces the key concepts in the analysis of categoricaldata with illustrative examples and accompanying R code</b></p> <p>This book is aimed at all those who wish to discover how to analyze categorical data without getting immersed in complicated mathematics and without needing to wade through a large amount of prose. It is aimed at researchers with their own data ready to be analyzed and at students who would like an approachable alternative view of the subject.</p> <p>Each new topic in categorical data analysis is illustrated with an example that readers can apply to their own sets of data. In many cases, R code is given and excerpts from the resulting output are presented. In the context of log-linear models for cross-tabulations, two specialties of the house have been included: the use of cobweb diagrams to get visual information concerning significant interactions, and a procedure for detecting outlier category combinations. The R code used for these is available and may be freely adapted. In addition, this book:</p> <ul> <li>Uses an example to illustrate each new topic in categorical data</li> <li>Provides a clear explanation of an important subject</li> <li>Is understandable to most readers with minimal statistical and mathematical backgrounds</li> <li>Contains examples that are accompanied by R code and resulting output</li> <li>Includes starred sections that provide more background details for interested readers</li> </ul> <p><i>Categorical Data Analysis by Example </i>is a reference for students in statistics and researchers in other disciplines, especially the social sciences, who use categorical data. This book is also a reference for practitioners in market research, medicine, and other fields.</p>

Diese Produkte könnten Sie auch interessieren: