Details

Data Mining and Business Analytics with R


Data Mining and Business Analytics with R


1. Aufl.

von: Johannes Ledolter

103,99 €

Verlag: Wiley
Format: EPUB
Veröffentl.: 28.05.2013
ISBN/EAN: 9781118572153
Sprache: englisch
Anzahl Seiten: 368

DRM-geschütztes eBook, Sie benötigen z.B. Adobe Digital Editions und eine Adobe ID zum Lesen.

Beschreibungen

<p>Collecting, analyzing, and extracting valuable information from a large amount of data requires easily accessible, robust, computational and analytical tools. <i>Data Mining and Business Analytics with R</i> utilizes the open source software R for the analysis, exploration, and simplification of large high-dimensional data sets. As a result, readers are provided with the needed guidance to model and interpret complicated data and become adept at building powerful models for prediction and classification.</p> <p>Highlighting both underlying concepts and practical computational skills, <i>Data Mining and Business Analytics with R</i> begins with coverage of standard linear regression and the importance of parsimony in statistical modeling. The book includes important topics such as penalty-based variable selection (LASSO); logistic regression; regression and classification trees; clustering; principal components and partial least squares; and the analysis of text and network data. In addition, the book presents:</p> <ul> <li>A thorough discussion and extensive demonstration of the theory behind the most useful data mining tools</li> <li>Illustrations of how to use the outlined concepts in real-world situations</li> <li>Readily available additional data sets and related R code allowing readers to apply their own analyses to the discussed materials</li> <li>Numerous exercises to help readers with computing skills and deepen their understanding of the material</li> </ul> <p><i>Data Mining and Business Analytics with R</i> is an excellent graduate-level textbook for courses on data mining and business analytics. The book is also a valuable reference for practitioners who collect and analyze data in the fields of finance, operations management, marketing, and the information sciences.</p>
<p>Preface ix</p> <p>Acknowledgments xi</p> <p><b>1. Introduction 1</b></p> <p>Reference 6</p> <p><b>2. Processing the Information and Getting to Know Your Data 7</b></p> <p>2.1 Example 1: 2006 Birth Data 7</p> <p>2.2 Example 2: Alumni Donations 17</p> <p>2.3 Example 3: Orange Juice 31</p> <p>References 39</p> <p><b>3. Standard Linear Regression 40</b></p> <p>3.1 Estimation in R 43</p> <p>3.2 Example 1: Fuel Efficiency of Automobiles 43</p> <p>3.3 Example 2: Toyota Used-Car Prices 47</p> <p>Appendix 3.A The Effects of Model Overfitting on the Average Mean Square Error of the Regression Prediction 53</p> <p>References 54</p> <p><b>4. Local Polynomial Regression: a Nonparametric Regression Approach 55</b></p> <p>4.1 Model Selection 56</p> <p>4.2 Application to Density Estimation and the Smoothing of Histograms 58</p> <p>4.3 Extension to the Multiple Regression Model 58</p> <p>4.4 Examples and Software 58</p> <p>References 65</p> <p><b>5. Importance of Parsimony in Statistical Modeling 67</b></p> <p>5.1 How Do We Guard Against False Discovery 67</p> <p>References 70</p> <p><b>6. Penalty-Based Variable Selection in Regression Models with Many Parameters (LASSO) 71</b></p> <p>6.1 Example 1: Prostate Cancer 74</p> <p>6.2 Example 2: Orange Juice 78</p> <p>References 82</p> <p><b>7. Logistic Regression 83</b></p> <p>7.1 Building a Linear Model for Binary Response Data 83</p> <p>7.2 Interpretation of the Regression Coefficients in a Logistic Regression Model 85</p> <p>7.3 Statistical Inference 85</p> <p>7.4 Classification of New Cases 86</p> <p>7.5 Estimation in R 87</p> <p>7.6 Example 1: Death Penalty Data 87</p> <p>7.7 Example 2: Delayed Airplanes 92</p> <p>7.8 Example 3: Loan Acceptance 100</p> <p>7.9 Example 4: German Credit Data 103</p> <p>References 107</p> <p><b>8. Binary Classification, Probabilities, and Evaluating Classification Performance 108</b></p> <p>8.1 Binary Classification 108</p> <p>8.2 Using Probabilities to Make Decisions 108</p> <p>8.3 Sensitivity and Specificity 109</p> <p>8.4 Example: German Credit Data 109</p> <p><b>9. Classification Using a Nearest Neighbor Analysis 115</b></p> <p>9.1 The k-Nearest Neighbor Algorithm 116</p> <p>9.2 Example 1: Forensic Glass 117</p> <p>9.3 Example 2: German Credit Data 122</p> <p>Reference 125</p> <p><b>10. The Na¨ýve Bayesian Analysis: a Model for Predicting a Categorical Response from Mostly Categorical</b></p> <p>Predictor Variables 126</p> <p>10.1 Example: Delayed Airplanes 127</p> <p>Reference 131</p> <p><b>11. Multinomial Logistic Regression 132</b></p> <p>11.1 Computer Software 134</p> <p>11.2 Example 1: Forensic Glass 134</p> <p>11.3 Example 2: Forensic Glass Revisited 141</p> <p>Appendix 11.A Specification of a Simple Triplet Matrix 147</p> <p>References 149</p> <p><b>12. More on Classification and a Discussion on Discriminant Analysis 150</b></p> <p>12.1 Fisher’s Linear Discriminant Function 153</p> <p>12.2 Example 1: German Credit Data 154</p> <p>12.3 Example 2: Fisher Iris Data 156</p> <p>12.4 Example 3: Forensic Glass Data 157</p> <p>12.5 Example 4: MBA Admission Data 159</p> <p>Reference 160</p> <p><b>13. Decision Trees 161</b></p> <p>13.1 Example 1: Prostate Cancer 167</p> <p>13.2 Example 2: Motorcycle Acceleration 179</p> <p>13.3 Example 3: Fisher Iris Data Revisited 182</p> <p><b>14. Further Discussion on Regression and Classification Trees, Computer Software, and Other Useful Classification Methods 185</b></p> <p>14.1 R Packages for Tree Construction 185</p> <p>14.2 Chi-Square Automatic Interaction Detection (CHAID) 186</p> <p>14.3 Ensemble Methods: Bagging, Boosting, and Random Forests 188</p> <p>14.4 Support Vector Machines (SVM) 192</p> <p>14.5 Neural Networks 192</p> <p>14.6 The R Package Rattle: A Useful Graphical User Interface for Data Mining 193</p> <p>References 195</p> <p><b>15. Clustering 196</b></p> <p>15.1 k-Means Clustering 196</p> <p>15.2 Another Way to Look at Clustering: Applying the Expectation-Maximization (EM) Algorithm to Mixtures of Normal Distributions 204</p> <p>15.3 Hierarchical Clustering Procedures 212</p> <p>References 219</p> <p><b>16. Market Basket Analysis: Association Rules and Lift 220</b></p> <p>16.1 Example 1: Online Radio 222</p> <p>16.2 Example 2: Predicting Income 227</p> <p>References 234</p> <p><b>17. Dimension Reduction: Factor Models and Principal Components 235</b></p> <p>17.1 Example 1: European Protein Consumption 238</p> <p>17.2 Example 2: Monthly US Unemployment Rates 243</p> <p><b>18. Reducing the Dimension in Regressions with Multicollinear Inputs: Principal Components Regression and Partial Least Squares 247</b></p> <p>18.1 Three Examples 249</p> <p>References 257</p> <p><b>19. Text as Data: Text Mining and Sentiment Analysis 258</b></p> <p>19.1 Inverse Multinomial Logistic Regression 259</p> <p>19.2 Example 1: Restaurant Reviews 261</p> <p>19.3 Example 2: Political Sentiment 266</p> <p>Appendix 19.A Relationship Between the Gentzkow Shapiro Estimate of “Slant” and Partial Least Squares 268</p> <p>References 271</p> <p><b>20. Network Data 272</b></p> <p>20.1 Example 1: Marriage and Power in Fifteenth Century Florence 274</p> <p>20.2 Example 2: Connections in a Friendship Network 278</p> <p>References 292</p> <p>Appendix A: Exercises 293</p> <p>Exercise 1 294</p> <p>Exercise 2 294</p> <p>Exercise 3 296</p> <p>Exercise 4 298</p> <p>Exercise 5 299</p> <p>Exercise 6 300</p> <p>Exercise 7 301</p> <p>Appendix B: References 338</p> <p>Index 341</p>
<p>"I first taught a Ph.D. level course in business applications of data mining 10 years ago. I regularly search the web, looking for business-oriented data mining books, and this is the first one I have found that is suitable for an MS in business analytics. I plan to use it. Anyone who teaches such a class and is inclined toward R should consider this text." (<i>Journal of the American Statistical Association</i>, 1 January 2014)</p>
<p><b>JOHANNES LEDOLTER,</b> PhD, is Professor in both the Department of Management Sciences and the Department of Statistics and Actuarial Science at the University of Iowa. He is a Fellow of the American Statistical Association and the American Society for Quality, and an Elected Member of the International Statistical Institute. Dr. Ledolter is the coauthor of <i>Statistical Methods for Forecasting, Achieving Quality Through Continual Improvement,</i> and <i>Statistical Quality Control: Strategies and Tools for Continual Improvement,</i> all published by Wiley.</p>
<p>Showcases <b>R's</b> critical role in the world of business</p> <p>Collecting, analyzing, and extracting valuable information from a large amount of data requires easily accessible robust computational and analytical tools. <i>Data Mining and Business Analytics with R</i> utilizes the open source software R for the analysis, exploration, and simplification of large high-dimensional data sets. As a result, readers are provided with the needed guidance to model and interpret complicated data and become adept at building powerful models for prediction and classification.</p> <p>Highlighting both underlying concepts and practical computational skills, <i>Data Mining and Business Analytics with R</i> begins with coverage of standard linear regression and the importance of parsimony in statistical modeling. The book includes important topics such as penalty-based variable selection (LASSO); logistic regression; regression and classification trees; clustering; principal components and partial least squares; and the analysis of text and network data. In addition, the book presents:</p> <ul> <li>A thorough discussion and extensive demonstration of the theory behind the most useful data mining tools</li> <li>Illustrations of how to use the outlined concepts in real-world situations</li> <li>Readily available additional data sets and related R code allowing readers to apply their own analyses to the discussed materials</li> <li>Numerous exercises to help readers with computing skills and deepen their understanding of the material</li> </ul> <p><i>Data Mining and Business Analytics with R</i> is an excellent graduate-level textbook for courses on data mining and business analytics. The book is also a valuable reference for practitioners who collect and analyze data in the fields of finance, operations management, marketing, and the information sciences.</p>

Diese Produkte könnten Sie auch interessieren:

Statistics for Microarrays
Statistics for Microarrays
von: Ernst Wit, John McClure
PDF ebook
90,99 €