Details

Applied Predictive Analytics


Applied Predictive Analytics

Principles and Techniques for the Professional Data Analyst
1. Aufl.

von: Dean Abbott

38,99 €

Verlag: Wiley
Format: PDF
Veröffentl.: 28.03.2014
ISBN/EAN: 9781118727935
Sprache: englisch
Anzahl Seiten: 464

DRM-geschütztes eBook, Sie benötigen z.B. Adobe Digital Editions und eine Adobe ID zum Lesen.

Beschreibungen

<p><b>Learn the art and science of predictive analytics — techniques that get results</b></p> <p>Predictive analytics is what translates big data into meaningful, usable business information. Written by a leading expert in the field, this guide examines the science of the underlying algorithms as well as the principles and best practices that govern the art of predictive analytics. It clearly explains the theory behind predictive analytics, teaches the methods, principles, and techniques for conducting predictive analytics projects, and offers tips and tricks that are essential for successful predictive modeling. Hands-on examples and case studies are included.</p> <ul> <li>The ability to successfully apply predictive analytics enables businesses to effectively interpret big data; essential for competition today</li> <li>This guide teaches not only the principles of predictive analytics, but also how to apply them to achieve real, pragmatic solutions</li> <li>Explains methods, principles, and techniques for conducting predictive analytics projects from start to finish</li> <li>Illustrates each technique with hands-on examples and includes as series of in-depth case studies that apply predictive analytics to common business scenarios</li> <li>A companion website provides all the data sets used to generate the examples as well as a free trial version of software</li> </ul> <p><i>Applied Predictive Analytics</i> arms data and business analysts and business managers with the tools they need to interpret and capitalize on big data.</p>
<p>Introduction xxi</p> <p><b>Chapter 1 Overview of Predictive Analytics 1</b></p> <p>What Is Analytics? 3</p> <p>What Is Predictive Analytics? 3</p> <p>Supervised vs. Unsupervised Learning 5</p> <p>Parametric vs. Non-Parametric Models 6</p> <p>Business Intelligence 6</p> <p>Predictive Analytics vs. Business Intelligence 8</p> <p>Do Predictive Models Just State the Obvious? 9</p> <p>Similarities between Business Intelligence and Predictive Analytics 9</p> <p>Predictive Analytics vs. Statistics 10</p> <p>Statistics and Analytics 11</p> <p>Predictive Analytics and Statistics Contrasted 12</p> <p>Predictive Analytics vs. Data Mining 13</p> <p>Who Uses Predictive Analytics? 13</p> <p>Challenges in Using Predictive Analytics 14</p> <p>Obstacles in Management 14</p> <p>Obstacles with Data 14</p> <p>Obstacles with Modeling 15</p> <p>Obstacles in Deployment 16</p> <p>What Educational Background Is Needed to Become a Predictive Modeler? 16</p> <p><b>Chapter 2 Setting Up the Problem 19</b></p> <p>Predictive Analytics Processing Steps: CRISP-DM 19</p> <p>Business Understanding 21</p> <p>The Three-Legged Stool 22</p> <p>Business Objectives 23</p> <p>Defining Data for Predictive Modeling 25</p> <p>Defining the Columns as Measures 26</p> <p>Defining the Unit of Analysis 27</p> <p>Which Unit of Analysis? 28</p> <p>Defining the Target Variable 29</p> <p>Temporal Considerations for Target Variable 31</p> <p>Defining Measures of Success for Predictive Models 32</p> <p>Success Criteria for Classification 32</p> <p>Success Criteria for Estimation 33</p> <p>Other Customized Success Criteria 33</p> <p>Doing Predictive Modeling Out of Order 34</p> <p>Building Models First 34</p> <p>Early Model Deployment 35</p> <p>Case Study: Recovering Lapsed Donors 35</p> <p>Overview 36</p> <p>Business Objectives 36</p> <p>Data for the Competition 36</p> <p>The Target Variables 36</p> <p>Modeling Objectives 37</p> <p>Model Selection and Evaluation Criteria 38</p> <p>Model Deployment 39</p> <p>Case Study: Fraud Detection 39</p> <p>Overview 39</p> <p>Business Objectives 39</p> <p>Data for the Project 40</p> <p>The Target Variables 40</p> <p>Modeling Objectives 41</p> <p>Model Selection and Evaluation Criteria 41</p> <p>Model Deployment 41</p> <p>Summary 42</p> <p><b>Chapter 3 Data Understanding 43</b></p> <p>What the Data Looks Like 44</p> <p>Single Variable Summaries 44</p> <p>Mean 45</p> <p>Standard Deviation 45</p> <p>The Normal Distribution 45</p> <p>Uniform Distribution 46</p> <p>Applying Simple Statistics in Data Understanding 47</p> <p>Skewness 49</p> <p>Kurtosis 51</p> <p>Rank-Ordered Statistics 52</p> <p>Categorical Variable Assessment 55</p> <p>Data Visualization in One Dimension 58</p> <p>Histograms 59</p> <p>Multiple Variable Summaries 64</p> <p>Hidden Value in Variable Interactions: Simpson’s Paradox 64</p> <p>The Combinatorial Explosion of Interactions 65</p> <p>Correlations 66</p> <p>Spurious Correlations 66</p> <p>Back to Correlations 67</p> <p>Crosstabs 68</p> <p>Data Visualization, Two or Higher Dimensions 69</p> <p>Scatterplots 69</p> <p>Anscombe’s Quartet 71</p> <p>Scatterplot Matrices 75</p> <p>Overlaying the Target Variable in Summary 76</p> <p>Scatterplots in More Than Two Dimensions 78</p> <p>The Value of Statistical Significance 80</p> <p>Pulling It All Together into a Data Audit 81</p> <p>Summary 82</p> <p><b>Chapter 4 Data Preparation 83</b></p> <p>Variable Cleaning 84</p> <p>Incorrect Values 84</p> <p>Consistency in Data Formats 85</p> <p>Outliers 85</p> <p>Multidimensional Outliers 89</p> <p>Missing Values 90</p> <p>Fixing Missing Data 91</p> <p>Feature Creation 98</p> <p>Simple Variable Transformations 98</p> <p>Fixing Skew 99</p> <p>Binning Continuous Variables 103</p> <p>Numeric Variable Scaling 104</p> <p>Nominal Variable Transformation 107</p> <p>Ordinal Variable Transformations 108</p> <p>Date and Time Variable Features 109</p> <p>ZIP Code Features 110</p> <p>Which Version of a Variable Is Best? 110</p> <p>Multidimensional Features 112</p> <p>Variable Selection Prior to Modeling 117</p> <p>Sampling 123</p> <p>Example: Why Normalization Matters for K-Means Clustering 139</p> <p>Summary 143</p> <p><b>Chapter 5 Itemsets and Association Rules 145</b></p> <p>Terminology 146</p> <p>Condition 147</p> <p>Left-Hand-Side, Antecedent(s) 148</p> <p>Right-Hand-Side, Consequent, Output, Conclusion 148</p> <p>Rule (Item Set) 148</p> <p>Support 149</p> <p>Antecedent Support 149</p> <p>Confidence, Accuracy 150</p> <p>Lift 150</p> <p>Parameter Settings 151</p> <p>How the Data Is Organized 151</p> <p>Standard Predictive Modeling Data Format 151</p> <p>Transactional Format 152</p> <p>Measures of Interesting Rules 154</p> <p>Deploying Association Rules 156</p> <p>Variable Selection 157</p> <p>Interaction Variable Creation 157</p> <p>Problems with Association Rules 158</p> <p>Redundant Rules 158</p> <p>Too Many Rules 158</p> <p>Too Few Rules 159</p> <p>Building Classification Rules from Association Rules 159</p> <p>Summary 161</p> <p><b>Chapter 6 Descriptive Modeling 163</b></p> <p>Data Preparation Issues with Descriptive Modeling 164</p> <p>Principal Component Analysis 165</p> <p>The PCA Algorithm 165</p> <p>Applying PCA to New Data 169</p> <p>PCA for Data Interpretation 171</p> <p>Additional Considerations before Using PCA 172</p> <p>The Effect of Variable Magnitude on PCA Models 174</p> <p>Clustering Algorithms 177</p> <p>The K-Means Algorithm 178</p> <p>Data Preparation for K-Means 183</p> <p>Selecting the Number of Clusters 185</p> <p>The Kohonen SOM Algorithm 192</p> <p>Visualizing Kohonen Maps 194</p> <p>Similarities with K-Means 196</p> <p>Summary 197</p> <p><b>Chapter 7 Interpreting Descriptive Models 199</b></p> <p>Standard Cluster Model Interpretation 199</p> <p>Problems with Interpretation Methods 202</p> <p>Identifying Key Variables in Forming Cluster Models 203</p> <p>Cluster Prototypes 209</p> <p>Cluster Outliers 210</p> <p>Summary 212</p> <p><b>Chapter 8 Predictive Modeling 213</b></p> <p>Decision Trees 214</p> <p>The Decision Tree Landscape 215</p> <p>Building Decision Trees 218</p> <p>Decision Tree Splitting Metrics 221</p> <p>Decision Tree Knobs and Options 222</p> <p>Reweighting Records: Priors 224</p> <p>Reweighting Records: Misclassification Costs 224</p> <p>Other Practical Considerations for Decision Trees 229</p> <p>Logistic Regression 230</p> <p>Interpreting Logistic Regression Models 233</p> <p>Other Practical Considerations for Logistic Regression 235</p> <p>Neural Networks 240</p> <p>Building Blocks: The Neuron 242</p> <p>Neural Network Training 244</p> <p>The Flexibility of Neural Networks 247</p> <p>Neural Network Settings 249</p> <p>Neural Network Pruning 251</p> <p>Interpreting Neural Networks 252</p> <p>Neural Network Decision Boundaries 253</p> <p>Other Practical Considerations for Neural Networks 253</p> <p>K-Nearest Neighbor 254</p> <p>The k-NN Learning Algorithm 254</p> <p>Distance Metrics for k-NN 258</p> <p>Other Practical Considerations for k-NN 259</p> <p>Naïve Bayes 264</p> <p>Bayes’ Theorem 264</p> <p>The Naïve Bayes Classifier 268</p> <p>Interpreting Naïve Bayes Classifiers 268</p> <p>Other Practical Considerations for Naïve Bayes 269</p> <p>Regression Models 270</p> <p>Linear Regression 271</p> <p>Linear Regression Assumptions 274</p> <p>Variable Selection in Linear Regression 276</p> <p>Interpreting Linear Regression Models 278</p> <p>Using Linear Regression for Classification 279</p> <p>Other Regression Algorithms 280</p> <p>Summary 281</p> <p><b>Chapter 9 Assessing Predictive Models 283</b></p> <p>Batch Approach to Model Assessment 284</p> <p>Percent Correct Classification 284</p> <p>Rank-Ordered Approach to Model Assessment 293</p> <p>Assessing Regression Models 301</p> <p>Summary 304</p> <p><b>Chapter 10 Model Ensembles 307</b></p> <p>Motivation for Ensembles 307</p> <p>The Wisdom of Crowds 308</p> <p>Bias Variance Tradeoff 309</p> <p>Bagging 311</p> <p>Boosting 316</p> <p>Improvements to Bagging and Boosting 320</p> <p>Random Forests 320</p> <p>Stochastic Gradient Boosting 321</p> <p>Heterogeneous Ensembles 321</p> <p>Model Ensembles and Occam’s Razor 323</p> <p>Interpreting Model Ensembles 323</p> <p>Summary 326</p> <p><b>Chapter 11 Text Mining 327</b></p> <p>Motivation for Text Mining 328</p> <p>A Predictive Modeling Approach to Text Mining 329</p> <p>Structured vs. Unstructured Data 329</p> <p>Why Text Mining Is Hard 330</p> <p>Text Mining Applications 332</p> <p>Data Sources for Text Mining 333</p> <p>Data Preparation Steps 333</p> <p>POS Tagging 333</p> <p>Tokens 336</p> <p>Stop Word and Punctuation Filters 336</p> <p>Character Length and Number Filters 337</p> <p>Stemming 337</p> <p>Dictionaries 338</p> <p>The Sentiment Polarity Movie Data Set 339</p> <p>Text Mining Features 340</p> <p>Term Frequency 341</p> <p>Inverse Document Frequency 344</p> <p>Tf-idf 344</p> <p>Cosine Similarity 346</p> <p>Multi-Word Features: N-Grams 346</p> <p>Reducing Keyword Features 347</p> <p>Grouping Terms 347</p> <p>Modeling with Text Mining Features 347</p> <p>Regular Expressions 349</p> <p>Uses of Regular Expressions in Text Mining 351</p> <p>Summary 352</p> <p><b>Chapter 12 Model Deployment 353</b></p> <p>General Deployment Considerations 354</p> <p>Deployment Steps 355</p> <p>Summary 375</p> <p><b>Chapter 13 Case Studies 377</b></p> <p>Survey Analysis Case Study: Overview 377</p> <p>Business Understanding: Defining the Problem 378</p> <p>Data Understanding 380</p> <p>Data Preparation 381</p> <p>Modeling 385</p> <p>Deployment: “What-If” Analysis 391</p> <p>Revisit Models 392</p> <p>Deployment 401</p> <p>Summary and Conclusions 401</p> <p>Help Desk Case Study 402</p> <p>Data Understanding: Defining the Data 403</p> <p>Data Preparation 403</p> <p>Modeling 405</p> <p>Revisit Business Understanding 407</p> <p>Deployment 409</p> <p>Summary and Conclusions 411</p> <p>Index 413</p>
<p><b>DEAN ABBOTT</b> is President of Abbott Analytics, Inc. (San Diego). He is an internationally recognized data mining and predictive analytics expert with over two decades experience in fraud detection, risk modeling, text mining, personality assessment, planned giving, toxicology, and other applications. He is also Chief Scientist of SmarterRemarketer, a company focusing on behaviorally- and data-driven marketing and web analytics.</p>
<p><b>APPLY THE RIGHT ANALYTIC TECHNIQUE</b></p> <p>Applied Predictive Analytics: Principles and Techniques for the Professional Data Analyst shows tech-savvy business managers and data analysts how to use predictive analytics to solve practical business problems. It teaches readers the methods, principles, and techniques for conducting predictive analytics projects, from start to finish. Internationally recognized data mining and predictive analytics expert Dean Abbott provides a practical and authoritative guide to best practices for successful predictive modeling, including expert tips and tricks to avoid common pitfalls.</p> <p>This book explains the theory behind the principles of predictive analytics in plain English; readers don’t need an extensive background in math and statistics, which makes it ideal for most tech-savvy business and data analysts. Each of the chapters describes one or more specific techniques and how they relate to the overall process model for predictive analytics. The depth of the description of a technique will match the complexity of the approach, with the intent to describe the techniques in enough depth for a practitioner to understand the effect of the major parameters needed to effectively use the technique and interpret the results.</p> <p>Each of the techniques is illustrated by examples, either unique to the task or as part of predictive modeling competitions. The companion website will provide all of the data sets used to generate these examples, along with links to open source and commercial software, so that readers can recreate and explore the examples.</p> <p><b>With detailed descriptions of techniques that get results, <i>Applied Predictive Analytics</i> shows you how to:</b></p> <ul> <li><b>Choose the proper analytics technique for various scenarios</b></li> <li><b>Avoid common mistakes and identify the weaknesses of various techniques</b></li> <li><b>Mitigate outliers and fill in missing data when necessary</b></li> <li><b>Interpret predictive models often considered “black boxes,” including model ensembles</b></li> <li><b>Learn how to assess model performance so the best model is selected</b></li> <li><b>Apply the appropriate sampling techniques for building and updating models</b></li> </ul>

Diese Produkte könnten Sie auch interessieren:

MDX Solutions
MDX Solutions
von: George Spofford, Sivakumar Harinath, Christopher Webb, Dylan Hai Huang, Francesco Civardi
PDF ebook
53,99 €
Concept Data Analysis
Concept Data Analysis
von: Claudio Carpineto, Giovanni Romano
PDF ebook
107,99 €
Handbook of Virtual Humans
Handbook of Virtual Humans
von: Nadia Magnenat-Thalmann, Daniel Thalmann
PDF ebook
150,99 €