Details

Univariate, Bivariate, and Multivariate Statistics Using R

Quantitative Tools for Data Analysis and Data Science
1. Aufl.

von: Daniel J. Denis
107,99 €
Verlag:	Wiley
Format:	PDF
Veröffentl.:	16.04.2020
ISBN/EAN:	9781119549956
Sprache:	englisch
Anzahl Seiten:	384

In den Warenkorb

Als Gutschein

DRM-geschütztes eBook, Sie benötigen z.B. Adobe Digital Editions und eine Adobe ID zum Lesen.

Beschreibungen

Titelbeschreibung

A practical source for performing essential statistical analyses and data management tasks in R Univariate, Bivariate, and Multivariate Statistics Using R offers a practical and very user-friendly introduction to the use of R software that covers a range of statistical methods featured in data analysis and data science. The author— a noted expert in quantitative teaching —has written a quick go-to reference for performing essential statistical analyses and data management tasks in R. Requiring only minimal prior knowledge, the book introduces concepts needed for an immediate yet clear understanding of statistical concepts essential to interpreting software output. The author explores univariate, bivariate, and multivariate statistical methods, as well as select nonparametric tests. Altogether a hands-on manual on the applied statistics and essential R computing capabilities needed to write theses, dissertations, as well as research publications. The book is comprehensive in its coverage of univariate through to multivariate procedures, while serving as a friendly and gentle introduction to R software for the newcomer. This important resource: <ul> <li>Offers an introductory, concise guide to the computational tools that are useful for making sense out of data using R statistical software</li> <li>Provides a resource for students and professionals in the social, behavioral, and natural sciences</li> <li>Puts the emphasis on the computational tools used in the discovery of empirical patterns</li> <li>Features a variety of popular statistical analyses and data management tasks that can be immediately and quickly applied as needed to research projects</li> <li>Shows how to apply statistical analysis using R to data sets in order to get started quickly performing essential tasks in data analysis and data science</li> </ul> Written for students, professionals, and researchers primarily in the social, behavioral, and natural sciences, Univariate, Bivariate, and Multivariate Statistics Using R offers an easy-to-use guide for performing data analysis fast, with an emphasis on drawing conclusions from empirical observations. The book can also serve as a primary or secondary textbook for courses in data analysis or data science, or others in which quantitative methods are featured.

Inhaltsverzeichnis

Preface xiii 1 Introduction to Applied Statistics 1 1.1 The Nature of Statistics and Inference 2 1.2 A Motivating Example 3 1.3 What About “Big Data”? 4 1.4 Approach to Learning R 7 1.5 Statistical Modeling in a Nutshell 7 1.6 Statistical Significance Testing and Error Rates 10 1.7 Simple Example of Inference Using a Coin 11 1.8 Statistics is for Messy Situations 13 1.9 Type I versus Type II Errors 14 1.10 Point Estimates and Confidence Intervals 15 1.11 So What Can We Conclude from One Confidence Interval? 18 1.12 Variable Types 19 1.13 Sample Size, Statistical Power, and Statistical Significance 22 1.14 How “p < 0.05” Happens 23 1.15 Effect Size 25 1.16 The Verdict on Significance Testing 26 1.17 Training versus Test Data 27 1.18 How to Get the Most Out of This Book 28 Exercises 29 2 Introduction to R and Computational Statistics 31 2.1 How to Install R on Your Computer 34 2.2 How to Do Basic Mathematics with R 35 2.2.1 Combinations and Permutations 38 2.2.2 Plotting Curves Using curve() 39 2.3 Vectors and Matrices in R 41 2.4 Matrices in R 44 2.4.1 The Inverse of a Matrix 47 2.4.2 Eigenvalues and Eigenvectors 49 2.5 How to Get Data into R 52 2.6 Merging Data Frames 55 2.7 How to Install a Package in R, and How to Use It 55 2.8 How to View the Top, Bottom, and “Some” of a Data File 58 2.9 How to Select Subsets from a Dataframe 60 2.10 How R Deals with Missing Data 62 2.11 Using ls( ) to See Objects in the Workspace 63 2.12 Writing Your Own Functions 65 2.13 Writing Scripts 65 2.14 How to Create Factors in R 66 2.15 Using the table() Function 67 2.16 Requesting a Demonstration Using the example() Function 68 2.17 Citing R in Publications 69 Exercises 69 3 Exploring Data with R: Essential Graphics and Visualization 71 3.1 Statistics, R, and Visualization 71 3.2 R’s plot() Function 73 3.3 Scatterplots and Depicting Data in Two or More Dimensions 77 3.4 Communicating Density in a Plot 79 3.5 Stem-and-Leaf Plots 85 3.6 Assessing Normality 87 3.7 Box-and-Whisker Plots 89 3.8 Violin Plots 95 3.9 Pie Graphs and Charts 97 3.10 Plotting Tables 98 Exercises 99 4 Means, Correlations, Counts: Drawing Inferences Using Easy-to-Implement Statistical Tests 101 4.1 Computing z and Related Scores in R 101 4.2 Plotting Normal Distributions 105 4.3 Correlation Coefficients in R 106 4.4 Evaluating Pearson’s r for Statistical Significance 110 4.5 Spearman’s Rho: A Nonparametric Alternative to Pearson 111 4.6 Alternative Correlation Coefficients in R 113 4.7 Tests of Mean Differences 114 4.7.1 t-Tests for One Sample 114 4.7.2 Two-Sample t-Test 115 4.7.3 Was the Welch Test Necessary? 117 4.7.4 t-Test via Linear Model Set-up 118 4.7.5 Paired-Samples t-Test 118 4.8 Categorical Data 120 4.8.1 Binomial Test 120 4.8.2 Categorical Data Having More Than Two Possibilities 123 4.9 Radar Charts 126 4.10 Cohen’s Kappa 127 Exercises 129 5 Power Analysis and Sample Size Estimation Using R 131 5.1 What is Statistical Power? 131 5.2 Does That Mean Power and Huge Sample Sizes Are “Bad?” 133 5.3 Should I Be Estimating Power or Sample Size? 134 5.4 How Do I Know What the Effect Size Should Be? 135 5.4.1 Ways of Setting Effect Size in Power Analyses 135 5.5 Power for t-Tests 136 5.5.1 Example: Treatment versus Control Experiment 137 5.5.2 Extremely Small Effect Size 138 5.6 Estimating Power for a Given Sample Size 140 5.7 Power for Other Designs – The Principles Are the Same 140 5.7.1 Power for One-Way ANOVA 141 5.7.2 Converting R2 to f 143 5.8 Power for Correlations 143 5.9 Concluding Thoughts on Power 145 Exercises 146 6 Analysis of Variance: Fixed Effects, Random Effects, Mixed Models, and Repeated Measures 147 6.1 Revisiting t-Tests 147 6.2 Introducing the Analysis of Variance (ANOVA) 149 6.2.1 Achievement as a Function of Teacher 149 6.3 Evaluating Assumptions 152 6.3.1 Inferential Tests for Normality 153 6.3.2 Evaluating Homogeneity of Variances 154 6.4 Performing the ANOVA Using aov() 156 6.4.1 The Analysis of Variance Summary Table 157 6.4.2 Obtaining Treatment Effects 158 6.4.3 Plotting Results of the ANOVA 159 6.4.4 Post Hoc Tests on the Teacher Factor 159 6.5 Alternative Way of Getting ANOVA Results via lm() 161 6.5.1 Contrasts in lm() versus Tukey’s HSD 163 6.6 Factorial Analysis of Variance 163 6.6.1 Why Not Do Two One-Way ANOVAs? 163 6.7 Example of Factorial ANOVA 166 6.7.1 Graphing Main Effects and Interaction in the Same Plot 171 6.8 Should Main Effects Be Interpreted in the Presence of Interaction? 172 6.9 Simple Main Effects 173 6.10 Random Effects ANOVA and Mixed Models 175 6.10.1 A Rationale for Random Factors 176 6.10.2 One-Way Random Effects ANOVA in R 177 6.11 Mixed Models 180 6.12 Repeated-Measures Models 181 Exercises 186 7 Simple and Multiple Linear Regression 189 7.1 Simple Linear Regression 190 7.2 Ordinary Least-Squares Regression 192 7.3 Adjusted R2 198 7.4 Multiple Regression Analysis 199 7.5 Verifying Model Assumptions 202 7.6 Collinearity Among Predictors and the Variance Inflation Factor 206 7.7 Model-Building and Selection Algorithms 209 7.7.1 Simultaneous Inference 209 7.7.2 Hierarchical Regression 210 7.7.2.1 Example of Hierarchical Regression 211 7.8 Statistical Mediation 214 7.9 Best Subset and Forward Regression 217 7.9.1 How Forward Regression Works 218 7.10 Stepwise Selection 219 7.11 The Controversy Surrounding Selection Methods 221 Exercises 223 8 Logistic Regression and the Generalized Linear Model 225 8.1 The “Why” Behind Logistic Regression 225 8.2 Example of Logistic Regression in R 229 8.3 Introducing the Logit: The Log of the Odds 232 8.4 The Natural Log of the Odds 233 8.5 From Logits Back to Odds 235 8.6 Full Example of Logistic Regression 236 8.6.1 Challenger O-ring Data 236 8.7 Logistic Regression on Challenger Data 240 8.8 Analysis of Deviance Table 241 8.9 Predicting Probabilities 242 8.10 Assumptions of Logistic Regression 243 8.11 Multiple Logistic Regression 244 8.12 Training Error Rate Versus Test Error Rate 247 Exercises 248 9 Multivariate Analysis of Variance (MANOVA) and Discriminant Analysis 251 9.1 Why Conduct MANOVA? 252 9.2 Multivariate Tests of Significance 254 9.3 Example of MANOVA in R 257 9.4 Effect Size for MANOVA 259 9.5 Evaluating Assumptions in MANOVA 261 9.6 Outliers 262 9.7 Homogeneity of Covariance Matrices 263 9.7.1 What if the Box-M Test Had Suggested a Violation? 264 9.8 Linear Discriminant Function Analysis 265 9.9 Theory of Discriminant Analysis 266 9.10 Discriminant Analysis in R 267 9.11 Computing Discriminant Scores Manually 270 9.12 Predicting Group Membership 271 9.13 How Well Did the Discriminant Function Analysis Do? 272 9.14 Visualizing Separation 275 9.15 Quadratic Discriminant Analysis 276 9.16 Regularized Discriminant Analysis 278 Exercises 278 10 Principal Component Analysis 281 10.1 Principal Component Analysis Versus Factor Analysis 282 10.2 A Very Simple Example of PCA 283 10.2.1 Pearson’s 1901 Data 284 10.2.2 Assumptions of PCA 286 10.2.3 Running the PCA 288 10.2.4 Loadings in PCA 290 10.3 What Are the Loadings in PCA? 292 10.4 Properties of Principal Components 293 10.5 Component Scores 294 10.6 How Many Components to Keep? 295 10.6.1 The Scree Plot as an Aid to Component Retention 295 10.7 Principal Components of USA Arrests Data 297 10.8 Unstandardized Versus Standardized Solutions 301 Exercises 304 11 Exploratory Factor Analysis 307 11.1 Common Factor Analysis Model 308 11.2 A Technical and Philosophical Pitfall of EFA 310 11.3 Factor Analysis Versus Principal Component Analysis on the Same Data 311 11.3.1 Demonstrating the Non-Uniqueness Issue 311 11.4 The Issue of Factor Retention 314 11.5 Initial Eigenvalues in Factor Analysis 315 11.6 Rotation in Exploratory Factor Analysis 316 11.7 Estimation in Factor Analysis 318 11.8 Example of Factor Analysis on the Holzinger and Swineford Data 318 11.8.1 Obtaining Initial Eigenvalues 323 11.8.2 Making Sense of the Factor Solution 324 Exercises 325 12 Cluster Analysis 327 12.1 A Simple Example of Cluster Analysis 329 12.2 The Concepts of Proximity and Distance in Cluster Analysis 332 12.3 k-Means Cluster Analysis 332 12.4 Minimizing Criteria 333 12.5 Example of k-Means Clustering in R 334 12.5.1 Plotting the Data 335 12.6 Hierarchical Cluster Analysis 339 12.7 Why Clustering is Inherently Subjective 343 Exercises 344 13 Nonparametric Tests 347 13.1 Mann–Whitney U Test 348 13.2 Kruskal–Wallis Test 349 13.3 Nonparametric Test for Paired Comparisons and Repeated Measures 351 13.3.1 Wilcoxon Signed-Rank Test and Friedman Test 351 13.4 Sign Test 354 Exercises 356 References 359 Index 363

Autorenportrait

DANIEL J. DENIS, PHD, is Professor of Quantitative Psychology in the Department of Psychology at the University of Montana. D. Denis is the author of Applied Univariate, Bivariate, and Multivariate Statistics and SPSS Data Analysis for Univariate, Bivariate, and Multivariate Statistics, both published by Wiley.

Back cover copy

A practical source for performing essential statistical analyses and data management tasks in R Univariate, Bivariate, and Multivariate Statistics Using R offers a practical and very user-friendly introduction to the use of R software that covers a range of statistical methods featured in data analysis and data science. The author— a noted expert in quantitative teaching —has written a quick go-to reference for performing essential statistical analyses and data management tasks in R. Requiring only minimal prior knowledge, the book introduces concepts needed for an immediate yet clear understanding of statistical concepts essential to interpreting software output. The author explores univariate, bivariate, and multivariate statistical methods, as well as select nonparametric tests. Altogether, a hands-on manual on the applied statistics and essential R computing capabilities needed to write theses, dissertations, as well as research publications. The book is comprehensive in its coverage of univariate through to multivariate procedures, while serving as a friendly and gentle introduction to R software for the newcomer. This important resource: <ul> <li>Offers an introductory, concise guide to the computational tools that are useful for making sense out of data using R statistical software</li> <li>Provides a resource for students and professionals in the social, behavioral, and natural sciences</li> <li>Puts the emphasis on the computational tools used in the discovery of empirical patterns</li> <li>Features a variety of popular statistical analyses and data management tasks that can be immediately and quickly applied as needed to research projects</li> <li>Shows how to apply statistical analysis using R to data sets in order to get started quickly performing essential tasks in data analysis and data science</li> </ul> Written for students, professionals, and researchers primarily in the social, behavioral, and natural sciences, Univariate, Bivariate, and Multivariate Statistics Using R offers an easy-to-use guide for performing data analysis fast, with an emphasis on drawing conclusions from empirical observations. The book can also serve as a primary or secondary textbook for courses in data analysis or data science, or others in which quantitative methods are featured.