Details

Data Mining and Predictive Analytics


Data Mining and Predictive Analytics


Wiley Series on Methods and Applications in Data Mining 2. Aufl.

von: Daniel T. Larose

111,99 €

Verlag: Wiley
Format: PDF
Veröffentl.: 19.02.2015
ISBN/EAN: 9781118868676
Sprache: englisch
Anzahl Seiten: 824

DRM-geschütztes eBook, Sie benötigen z.B. Adobe Digital Editions und eine Adobe ID zum Lesen.

Beschreibungen

<p><b>Learn methods of data analysis and their application to real-world data sets</b></p> <p>This updated second edition serves as an introduction to data mining methods and models, including association rules, clustering, neural networks, logistic regression, and multivariate analysis. The authors apply a unified “white box” approach to data mining methods and models. This approach is designed to walk readers through the operations and nuances of the various methods, using small data sets, so readers can gain an insight into the inner workings of the method under review. Chapters provide readers with hands-on analysis problems, representing an opportunity for readers to apply their newly-acquired data mining expertise to solving real problems using large, real-world data sets.</p> <p><i>Data Mining and Predictive Analytics</i>:</p> <ul> <li>Offers comprehensive coverage of association rules, clustering, neural networks, logistic regression, multivariate analysis, and R statistical programming language</li> <li>Features over 750 chapter exercises, allowing readers to assess their understanding of the new material</li> <li>Provides a detailed case study that brings together the lessons learned in the book</li> <li>Includes access to the companion website, www.dataminingconsultant, with exclusive password-protected instructor content</li> </ul> <p><i>Data Mining and Predictive Analytics</i> will appeal to computer science and statistic students, as well as students in MBA programs, and chief executives.</p>
<p>PREFACE xxi</p> <p>ACKNOWLEDGMENTS xxix</p> <p><b>PART I DATA PREPARATION 1</b></p> <p><b>CHAPTER 1 AN INTRODUCTION TO DATA MINING AND PREDICTIVE ANALYTICS 3</b></p> <p>1.1 What is Data Mining? What is Predictive Analytics? 3</p> <p>1.2 Wanted: Data Miners 5</p> <p>1.3 The Need for Human Direction of Data Mining 6</p> <p>1.4 The Cross-Industry Standard Process for Data Mining: CRISP-DM 6</p> <p>1.4.1 CRISP-DM: The Six Phases 7</p> <p>1.5 Fallacies of Data Mining 9</p> <p>1.6 What Tasks Can Data Mining Accomplish 10</p> <p><b>CHAPTER 2 DATA PREPROCESSING 20</b></p> <p>2.1 Why do We Need to Preprocess the Data? 20</p> <p>2.2 Data Cleaning 21</p> <p>2.3 Handling Missing Data 22</p> <p>2.4 Identifying Misclassifications 25</p> <p>2.5 Graphical Methods for Identifying Outliers 26</p> <p>2.6 Measures of Center and Spread 27</p> <p>2.7 Data Transformation 30</p> <p>2.8 Min–Max Normalization 30</p> <p>2.9 <i>Z</i>-Score Standardization 31</p> <p>2.10 Decimal Scaling 32</p> <p>2.11 Transformations to Achieve Normality 32</p> <p>2.12 Numerical Methods for Identifying Outliers 38</p> <p>2.13 Flag Variables 39</p> <p>2.14 Transforming Categorical Variables into Numerical Variables 40</p> <p>2.15 Binning Numerical Variables 41</p> <p>2.16 Reclassifying Categorical Variables 42</p> <p>2.17 Adding an Index Field 43</p> <p>2.18 Removing Variables that are not Useful 43</p> <p>2.19 Variables that Should Probably not be Removed 43</p> <p>2.20 Removal of Duplicate Records 44</p> <p>2.21 A Word About ID Fields 45</p> <p><b>CHAPTER 3 EXPLORATORY DATA ANALYSIS 54</b></p> <p>3.1 Hypothesis Testing Versus Exploratory Data Analysis 54</p> <p>3.2 Getting to Know the Data Set 54</p> <p>3.3 Exploring Categorical Variables 56</p> <p>3.4 Exploring Numeric Variables 64</p> <p>3.5 Exploring Multivariate Relationships 69</p> <p>3.6 Selecting Interesting Subsets of the Data for Further Investigation 70</p> <p>3.7 Using EDA to Uncover Anomalous Fields 71</p> <p>3.8 Binning Based on Predictive Value 72</p> <p>3.9 Deriving New Variables: Flag Variables 75</p> <p>3.10 Deriving New Variables: Numerical Variables 77</p> <p>3.11 Using EDA to Investigate Correlated Predictor Variables 78</p> <p>3.12 Summary of Our EDA 81</p> <p><b>CHAPTER 4 DIMENSION-REDUCTION METHODS 92</b></p> <p>4.1 Need for Dimension-Reduction in Data Mining 92</p> <p>4.2 Principal Components Analysis 93</p> <p>4.3 Applying PCA to the <i>Houses </i>Data Set 96</p> <p>4.4 How Many Components Should We Extract? 102</p> <p>4.5 Profiling the Principal Components 105</p> <p>4.6 Communalities 108</p> <p>4.7 Validation of the Principal Components 110</p> <p>4.8 Factor Analysis 110</p> <p>4.9 Applying Factor Analysis to the <i>Adult </i>Data Set 111</p> <p>4.10 Factor Rotation 114</p> <p>4.11 User-Defined Composites 117</p> <p>4.12 An Example of a User-Defined Composite 118</p> <p><b>PART II STATISTICAL ANALYSIS 129</b></p> <p><b>CHAPTER 5 UNIVARIATE STATISTICAL ANALYSIS 131</b></p> <p>5.1 Data Mining Tasks in Discovering Knowledge in Data 131</p> <p>5.2 Statistical Approaches to Estimation and Prediction 131</p> <p>5.3 Statistical Inference 132</p> <p>5.4 How Confident are We in Our Estimates? 133</p> <p>5.5 Confidence Interval Estimation of the Mean 134</p> <p>5.6 How to Reduce the Margin of Error 136</p> <p>5.7 Confidence Interval Estimation of the Proportion 137</p> <p>5.8 Hypothesis Testing for the Mean 138</p> <p>5.9 Assessing the Strength of Evidence Against the Null Hypothesis 140</p> <p>5.10 Using Confidence Intervals to Perform Hypothesis Tests 141</p> <p>5.11 Hypothesis Testing for the Proportion 143</p> <p><b>CHAPTER 6 MULTIVARIATE STATISTICS 148</b></p> <p>6.1 Two-Sample <i>t</i>-Test for Difference in Means 148</p> <p>6.2 Two-Sample <i>Z</i>-Test for Difference in Proportions 149</p> <p>6.3 Test for the Homogeneity of Proportions 150</p> <p>6.4 Chi-Square Test for Goodness of Fit of Multinomial Data 152</p> <p>6.5 Analysis of Variance 153</p> <p><b>CHAPTER 7 PREPARING TO MODEL THE DATA 160</b></p> <p>7.1 Supervised Versus Unsupervised Methods 160</p> <p>7.2 Statistical Methodology and Data Mining Methodology 161</p> <p>7.3 Cross-Validation 161</p> <p>7.4 Overfitting 163</p> <p>7.5 Bias–Variance Trade-Off 164</p> <p>7.6 Balancing the Training Data Set 166</p> <p>7.7 Establishing Baseline Performance 167</p> <p><b>CHAPTER 8 SIMPLE LINEAR REGRESSION 171</b></p> <p>8.1 An Example of Simple Linear Regression 171</p> <p>8.2 Dangers of Extrapolation 177</p> <p>8.3 How Useful is the Regression? The Coefficient of Determination, <i>r</i>2 178</p> <p>8.4 Standard Error of the Estimate, <i>s </i>183</p> <p>8.5 Correlation Coefficient <i>r </i>184</p> <p>8.6 Anova Table for Simple Linear Regression 186</p> <p>8.7 Outliers, High Leverage Points, and Influential Observations 186</p> <p>8.8 Population Regression Equation 195</p> <p>8.9 Verifying the Regression Assumptions 198</p> <p>8.10 Inference in Regression 203</p> <p>8.11 <i>t</i>-Test for the Relationship Between <i>x </i>and <i>y </i>204</p> <p>8.12 Confidence Interval for the Slope of the Regression Line 206</p> <p>8.13 Confidence Interval for the Correlation Coefficient <i>p</i> 208</p> <p>8.14 Confidence Interval for the Mean Value of <i>y </i>Given <i>x </i>210</p> <p>8.15 Prediction Interval for a Randomly Chosen Value of <i>y </i>Given <i>x </i>211</p> <p>8.16 Transformations to Achieve Linearity 213</p> <p>8.17 Box–Cox Transformations 220</p> <p><b>CHAPTER 9 MULTIPLE REGRESSION AND MODEL BUILDING 236</b></p> <p>9.1 An Example of Multiple Regression 236</p> <p>9.2 The Population Multiple Regression Equation 242</p> <p>9.3 Inference in Multiple Regression 243</p> <p>9.4 Regression with Categorical Predictors, Using Indicator Variables 249</p> <p>9.5 Adjusting <i>R</i>2: Penalizing Models for Including Predictors that are not Useful 256</p> <p>9.6 Sequential Sums of Squares 257</p> <p>9.7 Multicollinearity 258</p> <p>9.8 Variable Selection Methods 266</p> <p>9.9 Gas Mileage Data Set 270</p> <p>9.10 An Application of Variable Selection Methods 271</p> <p>9.11 Using the Principal Components as Predictors in Multiple Regression 279</p> <p><b>PART III CLASSIFICATION 299</b></p> <p><b>CHAPTER 10 k-NEAREST NEIGHBOR ALGORITHM 301</b></p> <p>10.1 Classification Task 301</p> <p>10.2 <i>k</i>-Nearest Neighbor Algorithm 302</p> <p>10.3 Distance Function 305</p> <p>10.4 Combination Function 307</p> <p>10.5 Quantifying Attribute Relevance: Stretching the Axes 309</p> <p>10.6 Database Considerations 310</p> <p>10.7 <i>k</i>-Nearest Neighbor Algorithm for Estimation and Prediction 310</p> <p>10.8 Choosing <i>k </i>311</p> <p>10.9 Application of <i>k</i>-Nearest Neighbor Algorithm Using IBM/SPSS Modeler 312</p> <p><b>CHAPTER 11 DECISION TREES 317</b></p> <p>11.1 What is a Decision Tree? 317</p> <p>11.2 Requirements for Using Decision Trees 319</p> <p>11.3 Classification and Regression Trees 319</p> <p>11.4 C4.5 Algorithm 326</p> <p>11.5 Decision Rules 332</p> <p>11.6 Comparison of the C5.0 and CART Algorithms Applied to Real Data 332</p> <p><b>CHAPTER 12 NEURAL NETWORKS 339</b></p> <p>12.1 Input and Output Encoding 339</p> <p>12.2 Neural Networks for Estimation and Prediction 342</p> <p>12.3 Simple Example of a Neural Network 342</p> <p>12.4 Sigmoid Activation Function 344</p> <p>12.5 Back-Propagation 345</p> <p>12.6 Gradient-Descent Method 346</p> <p>12.7 Back-Propagation Rules 347</p> <p>12.8 Example of Back-Propagation 347</p> <p>12.9 Termination Criteria 349</p> <p>12.10 Learning Rate 350</p> <p>12.11 Momentum Term 351</p> <p>12.12 Sensitivity Analysis 353</p> <p>12.13 Application of Neural Network Modeling 353</p> <p><b>CHAPTER 13 LOGISTIC REGRESSION 359</b></p> <p>13.1 Simple Example of Logistic Regression 359</p> <p>13.2 Maximum Likelihood Estimation 361</p> <p>13.3 Interpreting Logistic Regression Output 362</p> <p>13.4 Inference: are the Predictors Significant? 363</p> <p>13.5 Odds Ratio and Relative Risk 365</p> <p>13.6 Interpreting Logistic Regression for a Dichotomous Predictor 367</p> <p>13.7 Interpreting Logistic Regression for a Polychotomous Predictor 370</p> <p>13.8 Interpreting Logistic Regression for a Continuous Predictor 374</p> <p>13.9 Assumption of Linearity 378</p> <p>13.10 Zero-Cell Problem 382</p> <p>13.11 Multiple Logistic Regression 384</p> <p>13.12 Introducing Higher Order Terms to Handle Nonlinearity 388</p> <p>13.13 Validating the Logistic Regression Model 395</p> <p>13.14 WEKA: Hands-On Analysis Using Logistic Regression 399</p> <p><b>CHAPTER 14 NAÏVE BAYES AND BAYESIAN NETWORKS 414</b></p> <p>14.1 Bayesian Approach 414</p> <p>14.2 Maximum a Posteriori (Map) Classification 416</p> <p>14.3 Posterior Odds Ratio 420</p> <p>14.4 Balancing the Data 422</p> <p>14.5 Naïve Bayes Classification 423</p> <p>14.6 Interpreting the Log Posterior Odds Ratio 426</p> <p>14.7 Zero-Cell Problem 428</p> <p>14.8 Numeric Predictors for Naïve Bayes Classification 429</p> <p>14.9 WEKA: Hands-on Analysis Using Naïve Bayes 432</p> <p>14.10 Bayesian Belief Networks 436</p> <p>14.11 Clothing Purchase Example 436</p> <p>14.12 Using the Bayesian Network to Find Probabilities 439</p> <p><b>CHAPTER 15 MODEL EVALUATION TECHNIQUES 451</b></p> <p>15.1 Model Evaluation Techniques for the Description Task 451</p> <p>15.2 Model Evaluation Techniques for the Estimation and Prediction Tasks 452</p> <p>15.3 Model Evaluation Measures for the Classification Task 454</p> <p>15.4 Accuracy and Overall Error Rate 456</p> <p>15.5 Sensitivity and Specificity 457</p> <p>15.6 False-Positive Rate and False-Negative Rate 458</p> <p>15.7 Proportions of True Positives, True Negatives, False Positives, and False Negatives 458</p> <p>15.8 Misclassification Cost Adjustment to Reflect Real-World Concerns 460</p> <p>15.9 Decision Cost/Benefit Analysis 462</p> <p>15.10 Lift Charts and Gains Charts 463</p> <p>15.11 Interweaving Model Evaluation with Model Building 466</p> <p>15.12 Confluence of Results: Applying a Suite of Models 466</p> <p><b>CHAPTER 16 COST-BENEFIT ANALYSIS USING DATA-DRIVEN COSTS 471</b></p> <p>16.1 Decision Invariance Under Row Adjustment 471</p> <p>16.2 Positive Classification Criterion 473</p> <p>16.3 Demonstration of the Positive Classification Criterion 474</p> <p>16.4 Constructing the Cost Matrix 474</p> <p>16.5 Decision Invariance Under Scaling 476</p> <p>16.6 Direct Costs and Opportunity Costs 478</p> <p>16.7 Case Study: Cost-Benefit Analysis Using Data-Driven Misclassification Costs 478</p> <p>16.8 Rebalancing as a Surrogate for Misclassification Costs 483</p> <p><b>CHAPTER 17 COST-BENEFIT ANALYSIS FOR TRINARY AND k-NARY CLASSIFICATION MODELS 491</b></p> <p>17.1 Classification Evaluation Measures for a Generic Trinary Target 491</p> <p>17.2 Application of Evaluation Measures for Trinary Classification to the Loan Approval Problem 494</p> <p>17.3 Data-Driven Cost-Benefit Analysis for Trinary Loan Classification Problem 498</p> <p>17.4 Comparing Cart Models with and without Data-Driven Misclassification Costs 500</p> <p>17.5 Classification Evaluation Measures for a Generic <i>k</i>-Nary Target 503</p> <p>17.6 Example of Evaluation Measures and Data-Driven Misclassification Costs for <i>k</i>-Nary Classification 504</p> <p><b>CHAPTER 18 GRAPHICAL EVALUATION OF CLASSIFICATION MODELS 510</b></p> <p>18.1 Review of Lift Charts and Gains Charts 510</p> <p>18.2 Lift Charts and Gains Charts Using Misclassification Costs 510</p> <p>18.3 Response Charts 511</p> <p>18.4 Profits Charts 512</p> <p>18.5 Return on Investment (ROI) Charts 514</p> <p><b>PART IV CLUSTERING 521</b></p> <p><b>CHAPTER 19 HIERARCHICAL AND k-MEANS CLUSTERING 523</b></p> <p>19.1 The Clustering Task 523</p> <p>19.2 Hierarchical Clustering Methods 525</p> <p>19.3 Single-Linkage Clustering 526</p> <p>19.4 Complete-Linkage Clustering 527</p> <p>19.5 <i>k</i>-Means Clustering 529</p> <p>19.6 Example of <i>k</i>-Means Clustering at Work 530</p> <p>19.7 Behavior of MSB, MSE, and Pseudo-<i>F </i>as the <i>k</i>-Means Algorithm Proceeds 533</p> <p>19.8 Application of <i>k</i>-Means Clustering Using SAS Enterprise Miner 534</p> <p>19.9 Using Cluster Membership to Predict Churn 537</p> <p><b>CHAPTER 20 KOHONEN NETWORKS 542</b></p> <p>20.1 Self-Organizing Maps 542</p> <p>20.2 Kohonen Networks 544</p> <p>20.3 Example of a Kohonen Network Study 545</p> <p>20.4 Cluster Validity 549</p> <p>20.5 Application of Clustering Using Kohonen Networks 549</p> <p>20.6 Interpreting The Clusters 551</p> <p>20.7 Using Cluster Membership as Input to Downstream Data Mining Models 556</p> <p><b>CHAPTER 21 BIRCH CLUSTERING 560</b></p> <p>21.1 Rationale for Birch Clustering 560</p> <p>21.2 Cluster Features 561</p> <p>21.3 Cluster Feature Tree 562</p> <p>21.4 Phase 1: Building the CF Tree 562</p> <p>21.5 Phase 2: Clustering the Sub-Clusters 564</p> <p>21.6 Example of Birch Clustering, Phase 1: Building the CF Tree 565</p> <p>21.7 Example of Birch Clustering, Phase 2: Clustering the Sub-Clusters 570</p> <p>21.8 Evaluating the Candidate Cluster Solutions 571</p> <p>21.9 Case Study: Applying Birch Clustering to the Bank Loans Data Set 571</p> <p><b>CHAPTER 22 MEASURING CLUSTER GOODNESS 582</b></p> <p>22.1 Rationale for Measuring Cluster Goodness 582</p> <p>22.2 The Silhouette Method 583</p> <p>22.3 Silhouette Example 584</p> <p>22.4 Silhouette Analysis of the <i>IRIS </i>Data Set 585</p> <p>22.5 The Pseudo-<i>F </i>Statistic 590</p> <p>22.6 Example of the Pseudo-<i>F </i>Statistic 591</p> <p>22.7 Pseudo-<i>F </i>Statistic Applied to the <i>IRIS </i>Data Set 592</p> <p>22.8 Cluster Validation 593</p> <p>22.9 Cluster Validation Applied to the Loans Data Set 594</p> <p><b>PART V ASSOCIATION RULES 601</b></p> <p><b>CHAPTER 23 ASSOCIATION RULES 603</b></p> <p>23.1 Affinity Analysis and Market Basket Analysis 603</p> <p>23.2 Support, Confidence, Frequent Itemsets, and the a Priori Property 605</p> <p>23.3 How Does the A Priori Algorithm Work (Part 1)? Generating Frequent Itemsets 607</p> <p>23.4 How Does the A Priori Algorithm Work (Part 2)? Generating Association Rules 608</p> <p>23.5 Extension from Flag Data to General Categorical Data 611</p> <p>23.6 Information-Theoretic Approach: Generalized Rule Induction Method 612</p> <p>23.7 Association Rules are Easy to do Badly 614</p> <p>23.8 How can we Measure the Usefulness of Association Rules? 615</p> <p>23.9 Do Association Rules Represent Supervised or Unsupervised Learning? 616</p> <p>23.10 Local Patterns Versus Global Models 617</p> <p><b>PART VI ENHANCING MODEL PERFORMANCE 623</b></p> <p><b>CHAPTER 24 SEGMENTATION MODELS 625</b></p> <p>24.1 The Segmentation Modeling Process 625</p> <p>24.2 Segmentation Modeling Using EDA to Identify the Segments 627</p> <p>24.3 Segmentation Modeling using Clustering to Identify the Segments 629</p> <p><b>CHAPTER 25 ENSEMBLE METHODS: BAGGING AND BOOSTING 637</b></p> <p>25.1 Rationale for Using an Ensemble of Classification Models 637</p> <p>25.2 Bias, Variance, and Noise 639</p> <p>25.3 When to Apply, and not to apply, Bagging 640</p> <p>25.4 Bagging 641</p> <p>25.5 Boosting 643</p> <p>25.6 Application of Bagging and Boosting Using IBM/SPSS Modeler 647</p> <p><b>CHAPTER 26 MODEL VOTING AND PROPENSITY AVERAGING 653</b></p> <p>26.1 Simple Model Voting 653</p> <p>26.2 Alternative Voting Methods 654</p> <p>26.3 Model Voting Process 655</p> <p>26.4 An Application of Model Voting 656</p> <p>26.5 What is Propensity Averaging? 660</p> <p>26.6 Propensity Averaging Process 661</p> <p>26.7 An Application of Propensity Averaging 661</p> <p><b>PART VII FURTHER TOPICS 669</b></p> <p><b>CHAPTER 27 GENETIC ALGORITHMS 671</b></p> <p>27.1 Introduction To Genetic Algorithms 671</p> <p>27.2 Basic Framework of a Genetic Algorithm 672</p> <p>27.3 Simple Example of a Genetic Algorithm at Work 673</p> <p>27.4 Modifications and Enhancements: Selection 676</p> <p>27.5 Modifications and Enhancements: Crossover 678</p> <p>27.6 Genetic Algorithms for Real-Valued Variables 679</p> <p>27.7 Using Genetic Algorithms to Train a Neural Network 681</p> <p>27.8 WEKA: Hands-On Analysis Using Genetic Algorithms 684</p> <p><b>CHAPTER 28 IMPUTATION OF MISSING DATA 695</b></p> <p>28.1 Need for Imputation of Missing Data 695</p> <p>28.2 Imputation of Missing Data: Continuous Variables 696</p> <p>28.3 Standard Error of the Imputation 699</p> <p>28.4 Imputation of Missing Data: Categorical Variables 700</p> <p>28.5 Handling Patterns in Missingness 701</p> <p><b>PART VIII CASE STUDY: PREDICTING RESPONSE TO DIRECT-MAIL MARKETING 705</b></p> <p><b>CHAPTER 29 CASE STUDY, PART 1: BUSINESS UNDERSTANDING, DATA PREPARATION, AND EDA 707</b></p> <p>29.1 Cross-Industry Standard Practice for Data Mining 707</p> <p>29.2 Business Understanding Phase 709</p> <p>29.3 Data Understanding Phase, Part 1: Getting a Feel for the Data Set 710</p> <p>29.4 Data Preparation Phase 714</p> <p>29.5 Data Understanding Phase, Part 2: Exploratory Data Analysis 721</p> <p><b>CHAPTER 30 CASE STUDY, PART 2: CLUSTERING AND PRINCIPAL COMPONENTS ANALYSIS 732</b></p> <p>30.1 Partitioning the Data 732</p> <p>30.2 Developing the Principal Components 733</p> <p>30.3 Validating the Principal Components 737</p> <p>30.4 Profiling the Principal Components 737</p> <p>30.5 Choosing the Optimal Number of Clusters Using Birch Clustering 742</p> <p>30.6 Choosing the Optimal Number of Clusters Using <i>k</i>-Means Clustering 744</p> <p>30.7 Application of <i>k</i>-Means Clustering 745</p> <p>30.8 Validating the Clusters 745</p> <p>30.9 Profiling the Clusters 745</p> <p><b>CHAPTER 31 CASE STUDY, PART 3: MODELING AND EVALUATION FOR PERFORMANCE AND INTERPRETABILITY 749</b></p> <p>31.1 Do you Prefer the Best Model Performance, or a Combination of Performance and Interpretability? 749</p> <p>31.2 Modeling and Evaluation Overview 750</p> <p>31.3 Cost-Benefit Analysis Using Data-Driven Costs 751</p> <p>31.4 Variables to be Input to the Models 753</p> <p>31.5 Establishing the Baseline Model Performance 754</p> <p>31.6 Models that use Misclassification Costs 755</p> <p>31.7 Models that Need Rebalancing as a Surrogate for Misclassification Costs 756</p> <p>31.8 Combining Models Using Voting and Propensity Averaging 757</p> <p>31.9 Interpreting the Most Profitable Model 758</p> <p><b>CHAPTER 32 CASE STUDY, PART 4: MODELING AND EVALUATION FOR HIGH PERFORMANCE ONLY 762</b></p> <p>32.1 Variables to be Input to the Models 762</p> <p>32.2 Models that use Misclassification Costs 762</p> <p>32.3 Models that Need Rebalancing as a Surrogate for Misclassification Costs 764</p> <p>32.4 Combining Models using Voting and Propensity Averaging 765</p> <p>32.5 Lessons Learned 766</p> <p>32.6 Conclusions 766</p> <p><b>APPENDIX A DATA SUMMARIZATION AND VISUALIZATION 768</b></p> <p>Part 1: Summarization 1: Building Blocks of Data Analysis 768</p> <p>Part 2: Visualization: Graphs and Tables for Summarizing and Organizing Data 770</p> <p>Part 3: Summarization 2: Measures of Center, Variability, and Position 774</p> <p>Part 4: Summarization and Visualization of Bivariate Relationships 777</p> <p>INDEX 781</p>
<b>Daniel T. Larose</b> is Professor of Mathematical Sciences and Director of the Data Mining programs at Central Connecticut State University. He has published several books, including<i> Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage</i> (Wiley, 2007) and <i>Discovering Knowledge in Data: An Introduction to Data Mining</i> (Wiley, 2005). In addition to his scholarly work, Dr. Larose is a consultant in data mining and statistical analysis working with many high profile clients, including Microsoft, Forbes Magazine, the CIT Group, KPMG International, Computer Associates, and Deloitte, Inc.<br /><br /><b>Chantal D. Larose</b> is an Assistant Professor of Statistics & Data Science at Eastern Connecticut State University (ECSU).  She has co-authored three books on data science and predictive analytics.  She helped develop data science programs at ECSU and at SUNY New Paltz.  She received her PhD in Statistics from the University of Connecticut, Storrs in 2015 (dissertation title: Model-based Clustering of Incomplete Data).
<p><b>Learn methods of data analysis and their application to real-world data sets<br /><br /></b>This updated second edition serves as an introduction to data mining methods and models, including association rules, clustering, neural networks, logistic regression, and multivariate analysis. The authors apply a unified “white box” approach to data mining methods and models. This approach is designed to walk readers through the operations and nuances of the various methods, using small data sets, so readers can gain an insight into the inner workings of the method under review. Chapters provide readers with hands-on analysis problems, representing an opportunity for readers to apply their newly-acquired data mining expertise to solving real problems using large, real-world data sets.<br /><br /><i>Data Mining and Predictive Analytics</i>:<br /><br /></p> <ul> <li>Offers comprehensive coverage of association rules, clustering, neural networks, logistic regression, multivariate analysis, and R statistical programming language</li> <li>Features over 750 chapter exercises, allowing readers to assess their understanding of the new material</li> <li>Provides a detailed case study that brings together the lessons learned in the book</li> <li>Includes access to the companion website, www.dataminingconsultant, with exclusive password-protected instructor content</li> </ul> <p><i>Data Mining and Predictive Analytics</i> will appeal to computer science and statistic students, as well as students in MBA programs, and chief executives.</p>

Diese Produkte könnten Sie auch interessieren:

MDX Solutions
MDX Solutions
von: George Spofford, Sivakumar Harinath, Christopher Webb, Dylan Hai Huang, Francesco Civardi
PDF ebook
53,99 €
Concept Data Analysis
Concept Data Analysis
von: Claudio Carpineto, Giovanni Romano
PDF ebook
107,99 €
Handbook of Virtual Humans
Handbook of Virtual Humans
von: Nadia Magnenat-Thalmann, Daniel Thalmann
PDF ebook
150,99 €