Details

Data Mining for Business Analytics


Data Mining for Business Analytics

Concepts, Techniques and Applications in Python
1. Aufl.

von: Galit Shmueli, Peter C. Bruce, Peter Gedeck, Nitin R. Patel

103,99 €

Verlag: Wiley
Format: PDF
Veröffentl.: 14.10.2019
ISBN/EAN: 9781119549857
Sprache: englisch
Anzahl Seiten: 608

DRM-geschütztes eBook, Sie benötigen z.B. Adobe Digital Editions und eine Adobe ID zum Lesen.

Beschreibungen

<p><b><i>Data Mining for Business Analytics: Concepts, Techniques, and Applications in Python</i></b><b> presents an applied approach to data mining concepts and methods, using Python software for illustration</b></p> <p>Readers will learn how to implement a variety of popular data mining algorithms in Python (a free and open-source software) to tackle business problems and opportunities.</p> <p>This is the sixth version of this successful text, and the first using Python. It covers both statistical and machine learning algorithms for prediction, classification, visualization, dimension reduction, recommender systems, clustering, text mining and network analysis. It also includes:</p> <ul> <li>A new co-author, Peter Gedeck, who brings both experience teaching business analytics courses using Python, and expertise in the application of machine learning methods to the drug-discovery process</li> <li>A new section on ethical issues in data mining</li> <li>Updates and new material based on feedback from instructors teaching MBA, undergraduate, diploma and executive courses, and from their students</li> <li>More than a dozen case studies demonstrating applications for the data mining techniques described</li> <li>End-of-chapter exercises that help readers gauge and expand their comprehension and competency of the material presented</li> <li>A companion website with more than two dozen data sets, and instructor materials including exercise solutions, PowerPoint slides, and case solutions</li> </ul> <p><i>Data Mining for Business Analytics: Concepts, Techniques, and Applications in Python</i> is an ideal textbook for graduate and upper-undergraduate level courses in data mining, predictive analytics, and business analytics. This new edition is also an excellent reference for analysts, researchers, and practitioners working with quantitative methods in the fields of business, finance, marketing, computer science, and information technology.</p> <p>“This book has by far the most comprehensive review of business analytics methods that I have ever seen, covering everything from classical approaches such as linear and logistic regression, through to modern methods like neural networks, bagging and boosting, and even much more business specific procedures such as social network analysis and text mining. If not the bible, it is at the least a definitive manual on the subject.”</p> <p>—Gareth M. James, University of Southern California and co-author (with Witten, Hastie and Tibshirani) of the best-selling book <i>An Introduction to Statistical Learning, with Applications in R </i></p>
<p>Foreword by <i>Gareth James</i> xix</p> <p>Foreword by <i>Ravi Bapna</i> xxi</p> <p>Preface to the Python Edition xxiii</p> <p>Acknowledgments xxvii</p> <p><b>Part I Preliminaries</b></p> <p><b>Chapter 1</b> <b>Introduction 3</b></p> <p>1.1 What is Business Analytics? 3</p> <p>1.2 What is Data Mining? 5</p> <p>1.3 Data Mining and Related Terms 5</p> <p>1.4 Big Data 6</p> <p>1.5 Data Science 7</p> <p>1.6 Why are There So Many Different Methods? 8</p> <p>1.7 Terminology and Notation 9</p> <p>1.8 Road Maps to This Book 11</p> <p><b>Chapter 2</b> <b>Overview of the Data Mining Process 15</b></p> <p>2.1 Introduction 15</p> <p>2.2 Core Ideas in Data Mining 16</p> <p>2.3 The Steps in Data Mining 19</p> <p>2.4 Preliminary Steps 21</p> <p>2.5 Predictive Power and Overfitting 34</p> <p>2.6 Building a Predictive Model 40</p> <p>2.7 Using Python for Data Mining on a Local Machine 44</p> <p>2.8 Automating Data Mining Solutions 45</p> <p>2.9 Ethical Practice in Data Mining 47</p> <p>Problems 56</p> <p><b>Part II Data Exploration and Dimension Reduction</b></p> <p><b>Chapter 3</b> <b>Data Visualization 61</b></p> <p>3.1 Introduction 61</p> <p>3.2 Data Examples 64</p> <p>3.3 Basic Charts: Bar Charts, Line Graphs, and Scatter Plots 65</p> <p>3.4 Multidimensional Visualization 74</p> <p>3.5 Specialized Visualizations 88</p> <p>3.6 Summary: Major Visualizations and Operations, by Data Mining Goal 93</p> <p>Problems 97</p> <p><b>Chapter 4</b> <b>Dimension Reduction 99</b></p> <p>4.1 Introduction 100</p> <p>4.2 Curse of Dimensionality 100</p> <p>4.3 Practical Considerations 100</p> <p>4.4 Data Summaries 102</p> <p>4.5 Correlation Analysis 105</p> <p>4.6 Reducing the Number of Categories in Categorical Variables 106</p> <p>4.7 Converting a Categorical Variable to a Numerical Variable 108</p> <p>4.8 Principal Components Analysis 108</p> <p>4.9 Dimension Reduction Using Regression Models 119</p> <p>4.10 Dimension Reduction Using Classification and Regression Trees 119</p> <p>Problems 120</p> <p><b>Part III Performance Evaluation</b></p> <p><b>Chapter 5</b> <b>Evaluating Predictive Performance 125</b></p> <p>5.1 Introduction 126</p> <p>5.2 Evaluating Predictive Performance 126</p> <p>5.3 Judging Classifier Performance 131</p> <p>5.4 Judging Ranking Performance 144</p> <p>5.5 Oversampling 149</p> <p>Problems 155</p> <p><b>Part IV Prediction and Classification Methods</b></p> <p><b>Chapter 6</b> <b>Multiple Linear Regression 161</b></p> <p>6.1 Introduction 162</p> <p>6.2 Explanatory vs. Predictive Modeling 162</p> <p>6.3 Estimating the Regression Equation and Prediction 164</p> <p>6.4 Variable Selection in Linear Regression 169</p> <p>Appendix: Using Statmodels 179</p> <p>Problems 180</p> <p><b>Chapter 7</b> <b><i>k</i>-Nearest Neighbors (<i>k</i>NN) 185</b></p> <p>7.1 The <i>k</i>-NN Classifier (Categorical Outcome) 185</p> <p>7.2 <b><i>k</i></b>-NN for a Numerical Outcome 193</p> <p>7.3 Advantages and Shortcomings of <b><i>k</i></b>-NN Algorithms 195</p> <p>Problems 197</p> <p><b>Chapter 8</b> <b>The Naive Bayes Classifier 199</b></p> <p>8.1 Introduction 199</p> <p>Example 1: Predicting Fraudulent Financial Reporting 201</p> <p>8.2 Applying the Full (Exact) Bayesian Classifier 201</p> <p>8.3 Advantages and Shortcomings of the Naive Bayes Classifier 210</p> <p>Problems 214</p> <p><b>Chapter 9</b> <b>Classification and Regression Trees 217</b></p> <p>9.1 Introduction 218</p> <p>9.2 Classification Trees 220</p> <p>9.3 Evaluating the Performance of a Classification Tree 228</p> <p>9.4 Avoiding Overfitting 232</p> <p>9.5 Classification Rules from Trees 238</p> <p>9.6 Classification Trees for More Than Two Classes 239</p> <p>9.7 Regression Trees 239</p> <p>9.8 Improving Prediction: Random Forests and Boosted Trees 243</p> <p>9.9 Advantages and Weaknesses of a Tree 246</p> <p>Problems 248</p> <p><b>Chapter 10</b> <b>Logistic Regression 251</b></p> <p>10.1 Introduction 252</p> <p>10.2 The Logistic Regression Model 253</p> <p>10.3 Example: Acceptance of Personal Loan 255</p> <p>10.4 Evaluating Classification Performance 261</p> <p>10.5 Logistic Regression for Multi-class Classification 264</p> <p>10.6 Example of Complete Analysis: Predicting Delayed Flights 269</p> <p>Appendix: Using Statmodels 278</p> <p>Problems 280</p> <p><b>Chapter 11</b> <b>Neural Nets 283</b></p> <p>11.1 Introduction 284</p> <p>11.2 Concept and Structure of a Neural Network 284</p> <p>11.3 Fitting a Network to Data 285</p> <p>11.4 Required User Input 297</p> <p>11.5 Exploring the Relationship Between Predictors and Outcome 299</p> <p>11.6 Deep Learning 299</p> <p>11.7 Advantages and Weaknesses of Neural Networks 305</p> <p>Problems 306</p> <p><b>Chapter 12 Discriminant Analysis 309</b></p> <p>12.1 Introduction 310</p> <p>12.2 Distance of a Record from a Class 311</p> <p>12.3 Fisher’s Linear Classification Functions 314</p> <p>12.4 Classification Performance of Discriminant Analysis 317</p> <p>12.5 Prior Probabilities 318</p> <p>12.6 Unequal Misclassification Costs 319</p> <p>12.7 Classifying More Than Two Classes 319</p> <p>12.8 Advantages and Weaknesses 322</p> <p>Problems 324</p> <p><b>Chapter 13</b> <b>Combining Methods: Ensembles and Uplift Modeling 327</b></p> <p>13.1 Ensembles 328</p> <p>13.2 Uplift (Persuasion) Modeling 334</p> <p>13.3 Summary 340</p> <p>Problems 341</p> <p><b>Part V Mining Relationships among Records</b></p> <p><b>Chapter 14</b> <b>Association Rules and Collaborative Filtering 345</b></p> <p>14.1 Association Rules 346</p> <p>14.2 Collaborative Filtering 357</p> <p>14.3 Summary 368</p> <p>Problems 370</p> <p><b>Chapter 15</b> <b>Cluster Analysis 375</b></p> <p>15.1 Introduction 376</p> <p>15.2 Measuring Distance Between Two Records 379</p> <p>15.3 Measuring Distance Between Two Clusters 385</p> <p>15.4 Hierarchical (Agglomerative) Clustering 387</p> <p>15.5 Non-Hierarchical Clustering: The <i>k</i>-Means Algorithm 395</p> <p>Problems 401</p> <p><b>Part VI Forecasting Time Series</b></p> <p><b>Chapter 16</b> <b>Handling Time Series 407</b></p> <p>16.1 Introduction 408</p> <p>16.2 Descriptive vs. Predictive Modeling 409</p> <p>16.3 Popular Forecasting Methods in Business 409</p> <p>16.4 Time Series Components 410</p> <p>16.5 Data-Partitioning and Performance Evaluation 415</p> <p>Problems 419</p> <p><b>Chapter 17</b> <b>Regression-Based Forecasting 423</b></p> <p>17.1 A Model with Trend 424</p> <p>17.2 A Model with Seasonality 429</p> <p>17.3 A Model with Trend and Seasonality 432</p> <p>17.4 Autocorrelation and ARIMA Models 433</p> <p>Problems 442</p> <p><b>Chapter 18</b> <b>Smoothing Methods 451</b></p> <p>18.1 Introduction 452</p> <p>18.2 Moving Average 452</p> <p>18.3 Simple Exponential Smoothing 457</p> <p>18.4 Advanced Exponential Smoothing 460</p> <p>Problems 464</p> <p><b>Part VII Data Analytics</b></p> <p><b>Chapter 19</b> <b>Social Network Analytics 473</b></p> <p>19.1 Introduction 473</p> <p>19.2 Directed vs. Undirected Networks 475</p> <p>19.3 Visualizing and Analyzing Networks 476</p> <p>19.4 Social Data Metrics and Taxonomy 480</p> <p>19.5 Using Network Metrics in Prediction and Classification 485</p> <p>19.6 Collecting Social Network Data with Python 491</p> <p>19.7 Advantages and Disadvantages 491</p> <p>Problems 494</p> <p><b>Chapter 20</b> <b>Text Mining 495</b></p> <p>20.1 Introduction 496</p> <p>20.2 The Tabular Representation of Text: Term-Document Matrix and “Bag-of-Words’’ 496</p> <p>20.3 Bag-of-Words vs. Meaning Extraction at Document Level 497</p> <p>20.4 Preprocessing the Text 498</p> <p>20.5 Implementing Data Mining Methods 506</p> <p>20.6 Example: Online Discussions on Autos and Electronics 506</p> <p>20.7 Summary 510</p> <p>Problems 511</p> <p><b>Part VIII Cases</b></p> <p><b>Chapter 21</b> <b>Cases 515</b></p> <p>21.1 Charles Book Club 515</p> <p>21.2 German Credit 522</p> <p>21.3 Tayko Software Cataloger 527</p> <p>21.4 Political Persuasion 531</p> <p>21.5 Taxi Cancellations 535</p> <p>21.6 Segmenting Consumers of Bath Soap 537</p> <p>21.7 Direct-Mail Fundraising 541</p> <p>21.8 Catalog Cross-Selling 544</p> <p>21.9 Time Series Case: Forecasting Public Transportation Demand 546</p> <p>References 549</p> <p>Data Files Used in the Book 551</p> <p>Python Utilities Functions 555</p> <p>Index 565</p>
<p><b>GALIT SHMUELI, P<small>H</small>D</b>, is Distinguished Professor at National Tsing Hua University's Institute of Service Science. She has designed and instructed data mining courses since 2004 at University of Maryland, Statistics.com, Indian School of Business, and National Tsing Hua University, Taiwan. Professor Shmueli is known for her research and teaching in business analytics, with a focus on statistical and data mining methods in information systems and healthcare. She has authored over 100 publications including books. <p><b>PETER C. BRUCE</b> is President and Founder of the Institute for Statistics Education at Statistics.com. He has written multiple journal articles and is the developer of Resampling Stats software. He is the author of <i>Introductory Statistics and Analytics: A Resampling Perspective</i> (Wiley) and co-author of <i>Practical Statistics for Data Scientists: 50 Essential Concepts</i> (O'Reilly). <p><b>PETER GEDECK, P<small>H</small>D,</b> is a Senior Data Scientist at Collaborative Drug Discovery, where he helps develop cloud-based software to manage the huge amount of data involved in the drug discovery process. He also teaches data mining at Statistics.com. <p><b>NITIN R. PATEL, PhD,</b> is cofounder and board member of Cytel Inc., based in Cambridge, Massachusetts. A Fellow of the American Statistical Association, Dr. Patel has also served as a Visiting Professor at the Massachusetts Institute of Technology and at Harvard University. He is a Fellow of the Computer Society of India and was a professor at the Indian Institute of Management, Ahmedabad, for 15 years.
<p><b><i>Data Mining for Business Analytics: Concepts, Techniques, and Applications in Python</i></b><b> presents an applied approach to data mining concepts and methods, using Python software for illustration</b> <p>Readers will learn how to implement a variety of popular data mining algorithms in Python (a free and open-source software) to tackle business problems and opportunities. <p>This is the sixth version of this successful text, and the first using Python. It covers both statistical and machine learning algorithms for prediction, classification, visualization, dimension reduction, recommender systems, clustering, text mining and network analysis. It also includes: <ul> <li>A new co-author, Peter Gedeck, who brings both experience teaching business analytics courses using Python, and expertise in the application of machine learning methods to the drug-discovery process</li> <li>A new section on ethical issues in data mining</li> <li>Updates and new material based on feedback from instructors teaching MBA, undergraduate, diploma and executive courses, and from their students</li> <li>More than a dozen case studies demonstrating applications for the data mining techniques described</li> <li>End-of-chapter exercises that help readers gauge and expand their comprehension and competency of the material presented</li> <li>A companion website with more than two dozen data sets, and instructor materials including exercise solutions, PowerPoint slides, and case solutions</li> </ul> <p><i>Data Mining for Business Analytics: Concepts, Techniques, and Applications in Python</i> is an ideal textbook for graduate and upper-undergraduate level courses in data mining, predictive analytics, and business analytics. This new edition is also an excellent reference for analysts, researchers, and practitioners working with quantitative methods in the fields of business, finance, marketing, computer science, and information technology. <p>"This book has by far the most comprehensive review of business analytics methods that I have ever seen, covering everything from classical approaches such as linear and logistic regression, through to modern methods like neural networks, bagging and boosting, and even much more business specific procedures such as social network analysis and text mining. If not the bible, it is at the least a definitive manual on the subject."</br> <b>—GARETH M. JAMES,</b> University of Southern California and co-author (with Witten, Hastie and Tibshirani) of the best-selling book <i>An Introduction to Statistical Learning, with Applications in R</i>

Diese Produkte könnten Sie auch interessieren:

Statistics for Microarrays
Statistics for Microarrays
von: Ernst Wit, John McClure
PDF ebook
90,99 €