Details

Data Mining Techniques

For Marketing, Sales, and Customer Relationship Management
3. Aufl.

von: Gordon S. Linoff, Michael J. A. Berry
39,99 €
Verlag:	Wiley
Format:	EPUB
Veröffentl.:	23.03.2011
ISBN/EAN:	9781118087459
Sprache:	englisch
Anzahl Seiten:	896

In den Warenkorb

Als Gutschein

DRM-geschütztes eBook, Sie benötigen z.B. Adobe Digital Editions und eine Adobe ID zum Lesen.

Beschreibungen

Titelbeschreibung

The leading introductory book on data mining, fully updated and revised! When Berry and Linoff wrote the first edition of Data Mining Techniques in the late 1990s, data mining was just starting to move out of the lab and into the office and has since grown to become an indispensable tool of modern business. This new edition—more than 50% new and revised— is a significant update from the previous one, and shows you how to harness the newest data mining methods and techniques to solve common business problems. The duo of unparalleled authors share invaluable advice for improving response rates to direct marketing campaigns, identifying new customer segments, and estimating credit risk. In addition, they cover more advanced topics such as preparing data for analysis and creating the necessary infrastructure for data mining at your company. <ul> <li>Features significant updates since the previous edition and updates you on best practices for using data mining methods and techniques for solving common business problems</li> <li>Covers a new data mining technique in every chapter along with clear, concise explanations on how to apply each technique immediately</li> <li>Touches on core data mining techniques, including decision trees, neural networks, collaborative filtering, association rules, link analysis, survival analysis, and more</li> <li>Provides best practices for performing data mining using simple tools such as Excel</li> </ul> Data Mining Techniques, Third Edition covers a new data mining technique with each successive chapter and then demonstrates how you can apply that technique for improved marketing, sales, and customer support to get immediate results.

Inhaltsverzeichnis

Introduction xxxvii Chapter 1 What Is Data Mining and Why Do It? 1 What Is Data Mining? 2 Data Mining Is a Business Process 2 Large Amounts of Data 3 Meaningful Patterns and Rules 3 Data Mining and Customer Relationship Management 4 Why Now? 6 Data Is Being Produced 6 Data Is Being Warehoused 6 Computing Power Is Affordable 7 Interest in Customer Relationship Management Is Strong 7 Commercial Data Mining Software Products Have Become Available 8 Skills for the Data Miner 9 The Virtuous Cycle of Data Mining 9 A Case Study in Business Data Mining 11 Identifying BofA’s Business Challenge 12 Applying Data Mining 12 Acting on the Results 13 Measuring the Effects of Data Mining 14 Steps of the Virtuous Cycle 15 Identify Business Opportunities 16 Transform Data into Information 17 Act on the Information 19 Measure the Results 20 Data Mining in the Context of the Virtuous Cycle 23 Lessons Learned 26 Chapter 2 Data Mining Applications in Marketing and Customer Relationship Management 27 Two Customer Lifecycles 27 The Customer’s Lifecycle 28 The Customer Lifecycle 28 Subscription Relationships versus Event-Based Relationships 30 Organize Business Processes Around the Customer Lifecycle 32 Customer Acquisition 33 Customer Activation 36 Customer Relationship Management 37 Winback 38 Data Mining Applications for Customer Acquisition 38 Identifying Good Prospects 39 Choosing a Communication Channel 39 Picking Appropriate Messages 40 A Data Mining Example: Choosing the Right Place to Advertise 40 Who Fits the Profile? 41 Measuring Fitness for Groups of Readers 44 Data Mining to Improve Direct Marketing Campaigns 45 Response Modeling 46 Optimizing Response for a Fixed Budget 47 Optimizing Campaign Profitability 49 Reaching the People Most Influenced by the Message 53 Using Current Customers to Learn About Prospects 54 Start Tracking Customers Before They Become “Customers” 55 Gather Information from New Customers 55 Acquisition-Time Variables Can Predict Future Outcomes 56 Data Mining Applications for Customer Relationship Management 56 Matching Campaigns to Customers 56 Reducing Exposure to Credit Risk 58 Determining Customer Value 59 Cross-selling, Up-selling, and Making Recommendations 60 Retention 60 Recognizing Attrition 60 Why Attrition Matters 61 Different Kinds of Attrition 62 Different Kinds of Attrition Model 63 Beyond the Customer Lifecycle 64 Lessons Learned 65 Chapter 3 The Data Mining Process 67 What Can Go Wrong? 68 Learning Things That Aren’t True 68 Learning Things That Are True, but Not Useful 73 Data Mining Styles 74 Hypothesis Testing 75 Directed Data Mining 81 Undirected Data Mining 81 Goals, Tasks, and Techniques 82 Data Mining Business Goals 82 Data Mining Tasks 83 Data Mining Techniques 88 Formulating Data Mining Problems: From Goals to Tasks to Techniques 88 What Techniques for Which Tasks? 95 Is There a Target or Targets? 96 What Is the Target Data Like? 96 What Is the Input Data Like? 96 How Important Is Ease of Use? 97 How Important Is Model Explicability? 97 Lessons Learned 98 Chapter 4 Statistics 101: What You Should Know About Data 101 Occam’s Razor 103 Skepticism and Simpson’s Paradox 103 The Null Hypothesis 104 P-Values 105 Looking At and Measuring Data 106 Categorical Values 106 Numeric Variables 117 A Couple More Statistical Ideas 120 Measuring Response 120 Standard Error of a Proportion 121 Comparing Results Using Confidence Bounds 123 Comparing Results Using Difference of Proportions 124 Size of Sample 125 What the Confidence Interval Really Means 126 Size of Test and Control for an Experiment 127 Multiple Comparisons 129 The Confidence Level with Multiple Comparisons 129 Bonferroni’s Correction 129 Chi-Square Test 130 Expected Values 130 Chi-Square Value 132 Comparison of Chi-Square to Difference of Proportions 134 An Example: Chi-Square for Regions and Starts 134 Case Study: Comparing Two Recommendation Systems with an A/B Test 138 First Metric: Participating Sessions 140 Data Mining and Statistics 144 Lessons Learned 148 Chapter 5 Descriptions and Prediction: Profiling and Predictive Modeling 151 Directed Data Mining Models 152 Defining the Model Structure and Target 152 Incremental Response Modeling 154 Model Stability 156 Time-Frames in the Model Set 157 Directed Data Mining Methodology 159 Step 1: Translate the Business Problem into a Data Mining Problem 161 How Will Results Be Used? 163 How Will Results Be Delivered? 163 The Role of Domain Experts and Information Technology 164 Step 2: Select Appropriate Data 165 What Data Is Available? 166 How Much Data Is Enough? 167 How Much History Is Required? 167 How Many Variables? 168 What Must the Data Contain? 168 Step 3: Get to Know the Data 169 Examine Distributions 169 Compare Values with Descriptions 170 Validate Assumptions 170 Ask Lots of Questions 171 Step 4: Create a Model Set 172 Assembling Customer Signatures 172 Creating a Balanced Sample 172 Including Multiple Timeframes 174 Creating a Model Set for Prediction 174 Creating a Model Set for Profiling 176 Partitioning the Model Set 176 Step 5: Fix Problems with the Data 177 Categorical Variables with Too Many Values 177 Numeric Variables with Skewed Distributions and Outliers 178 Missing Values 178 Values with Meanings That Change over Time 179 Inconsistent Data Encoding 179 Step 6: Transform Data to Bring Information to the Surface 180 Step 7: Build Models 180 Step 8: Assess Models 180 Assessing Binary Response Models and Classifiers 181 Assessing Binary Response Models Using Lift 182 Assessing Binary Response Model Scores Using Lift Charts 184 Assessing Binary Response Model Scores Using Profitability Models 185 Assessing Binary Response Models Using ROC Charts 186 Assessing Estimators 188 Assessing Estimators Using Score Rankings 189 Step 9: Deploy Models 190 Practical Issues in Deploying Models 190 Optimizing Models for Deployment 191 Step 10: Assess Results 191 Step 11: Begin Again 193 Lessons Learned 193 Chapter 6 Data Mining Using Classic Statistical Techniques 195 Similarity Models 196 Similarity and Distance 196 Example: A Similarity Model for Product Penetration 197 Table Lookup Models 203 Choosing Dimensions 204 Partitioning the Dimensions 205 From Training Data to Scores 205 Handling Sparse and Missing Data by Removing Dimensions 205 RFM: A Widely Used Lookup Model 206 RFM Cell Migration 207 RFM and the Test-and-Measure Methodology 208 RFM and Incremental Response Modeling 209 Naïve Bayesian Models 210 Some Ideas from Probability 210 The Naïve Bayesian Calculation 212 Comparison with Table Lookup Models 213 Linear Regression 213 The Best-fit Line 215 Goodness of Fit 217 Multiple Regression 220 The Equation 220 The Range of the Target Variable 221 Interpreting Coefficients of Linear Regression Equations 221 Capturing Local Effects with Linear Regression 223 Additional Considerations with Multiple Regression 224 Variable Selection for Multiple Regression 225 Logistic Regression 227 Modeling Binary Outcomes 227 The Logistic Function 229 Fixed Effects and Hierarchical Effects 231 Hierarchical Effects 232 Within and Between Effects 232 Fixed Effects 233 Lessons Learned 234 Chapter 7 Decision Trees 237 What Is a Decision Tree and How Is It Used? 238 A Typical Decision Tree 238 Using the Tree to Learn About Churn 240 Using the Tree to Learn About Data and Select Variables 241 Using the Tree to Produce Rankings 243 Using the Tree to Estimate Class Probabilities 243 Using the Tree to Classify Records 244 Using the Tree to Estimate Numeric Values 244 Decision Trees Are Local Models 245 Growing Decision Trees 247 Finding the Initial Split 248 Growing the Full Tree 251 Finding the Best Split 252 Gini (Population Diversity) as a Splitting Criterion 253 Entropy Reduction or Information Gain as a Splitting Criterion 254 Information Gain Ratio 256 Chi-Square Test as a Splitting Criterion 256 Incremental Response as a Splitting Criterion 258 Reduction in Variance as a Splitting Criterion for Numeric Targets 259 F Test 262 Pruning 262 The CART Pruning Algorithm 263 Pessimistic Pruning: The C5.0 Pruning Algorithm 267 Stability-Based Pruning 268 Extracting Rules from Trees 269 Decision Tree Variations 270 Multiway Splits 270 Splitting on More Than One Field at a Time 271 Creating Nonrectangular Boxes 271 Assessing the Quality of a Decision Tree 275 When Are Decision Trees Appropriate? 276 Case Study: Process Control in a Coffee Roasting Plant 277 Goals for the Simulator 277 Building a Roaster Simulation 278 Evaluation of the Roaster Simulation 278 Lessons Learned 279 Chapter 8 Artificial Neural Networks 281 A Bit of History 282 The Biological Model 283 The Biological Neuron 285 The Biological Input Layer 286 The Biological Output Layer 287 Neural Networks and Artificial Intelligence 287 Artificial Neural Networks 288 The Artificial Neuron 288 The Multi-Layer Perceptron 291 A Network Example 292 Network Topologies 293 A Sample Application: Real Estate Appraisal 295 Training Neural Networks 299 How Does a Neural Network Learn Using Back Propagation? 299 Pruning a Neural Network 300 Radial Basis Function Networks 303 Overview of RBF Networks 303 Choosing the Locations of the Radial Basis Functions 305 Universal Approximators 305 Neural Networks in Practice 308 Choosing the Training Set 309 Coverage of Values for All Features 309 Number of Features 310 Size of Training Set 310 Number and Range of Outputs 310 Rules of Thumb for Using MLPs 310 Preparing the Data 311 Interpreting the Output from a Neural Network 313 Neural Networks for Time Series 315 Time Series Modeling 315 A Neural Network Time Series Example 316 Can Neural Network Models Be Explained? 317 Sensitivity Analysis 318 Using Rules to Describe the Scores 318 Lessons Learned 319 Chapter 9 Nearest Neighbor Approaches: Memory-Based Reasoning and Collaborative Filtering 321 Memory-Based Reasoning 322 Look-Alike Models 323 Example: Using MBR to Estimate Rents in Tuxedo, New York 324 Challenges of MBR 327 Choosing a Balanced Set of Historical Records 328 Representing the Training Data 328 Determining the Distance Function, Combination Function, and Number of Neighbors 331 Case Study: Using MBR for Classifying Anomalies in Mammograms 331 The Business Problem: Identifying Abnormal Mammograms 332 Applying MBR to the Problem 332 The Total Solution 334 Measuring Distance and Similarity 335 What Is a Distance Function? 335 Building a Distance Function One Field at a Time 337 Distance Functions for Other Data Types 340 When a Distance Metric Already Exists 341 The Combination Function: Asking the Neighbors for Advice 342 The Simplest Approach: One Neighbor 342 The Basic Approach for Categorical Targets: Democracy 342 Weighted Voting for Categorical Targets 344 Numeric Targets 344 Case Study: Shazam — Finding Nearest Neighbors for Audio Files 345 Why This Feat Is Challenging 346 The Audio Signature 347 Measuring Similarity 348 Collaborative Filtering: A Nearest-Neighbor Approach to Making Recommendations 351 Building Profiles 352 Comparing Profiles 352 Making Predictions 353 Lessons Learned 354 Chapter 10 Knowing When to Worry: Using Survival Analysis to Understand Customers 357 Customer Survival 360 What Survival Curves Reveal 360 Finding the Average Tenure from a Survival Curve 362 Customer Retention Using Survival 364 Looking at Survival as Decay 365 Hazard Probabilities 367 The Basic Idea 368 Examples of Hazard Functions 369 Censoring 371 The Hazard Calculation 372 Other Types of Censoring 375 From Hazards to Survival 376 Retention 376 Survival 378 Comparison of Retention and Survival 378 Proportional Hazards 380 Examples of Proportional Hazards 381 Stratification: Measuring Initial Effects on Survival 382 Cox Proportional Hazards 382 Survival Analysis in Practice 385 Handling Different Types of Attrition 385 When Will a Customer Come Back? 387 Understanding Customer Value 389 Forecasting 392 Hazards Changing over Time 393 Lessons Learned 394 Chapter 11 Genetic Algorithms and Swarm Intelligence 397 Optimization 398 What Is an Optimization Problem? 398 An Optimization Problem in Ant World 399 E Pluribus Unum 400 A Smarter Ant 401 Genetic Algorithms 403 A Bit of History 404 Genetics on Computers 404 Representing the Genome 413 Schemata: The Building Blocks of Genetic Algorithms 414 Beyond the Simple Algorithm 417 The Traveling Salesman Problem 418 Exhaustive Search 419 A Simple Greedy Algorithm 419 The Genetic Algorithms Approach 419 The Swarm Intelligence Approach 420 Case Study: Using Genetic Algorithms for Resource Optimization 421 Case Study: Evolving a Solution for Classifying Complaints 423 Business Context 424 Data 425 The Comment Signature 425 The Genomes 426 The Fitness Function 427 The Results 427 Lessons Learned 427 Chapter 12 Tell Me Something New: Pattern Discovery and Data Mining 429 Undirected Techniques, Undirected Data Mining 431 Undirected versus Directed Techniques 431 Undirected versus Directed Data Mining 431 Case Study: Undirected Data Mining Using Directed Techniques 432 What is Undirected Data Mining? 435 Data Exploration 435 Segmentation and Clustering 436 Target Variable Definition, When the Target Is Not Explicit 438 Simulation, Forecasting, and Agent-Based Modeling 443 Methodology for Undirected Data Mining 455 There Is No Methodology 456 Things to Keep in Mind 456 Lessons Learned 457 Chapter 13 Finding Islands of Similarity: Automatic Cluster Detection 459 Searching for Islands of Simplicity 461 Customer Segmentation and Clustering 461 Similarity Clusters 463 Tracking Campaigns by Cluster-Based Segments 464 Clustering Reveals an Overlooked Market Segment 466 Fitting the Troops 467 The K-Means Clustering Algorithm 468 Two Steps of the K-Means Algorithm 468 Voronoi Diagrams and K-Means Clusters 471 Choosing the Cluster Seeds 473 Choosing K 473 Using K-Means to Detect Outliers 474 Semi-Directed Clustering 475 Interpreting Clusters 475 Characterizing Clusters by Their Centroids 476 Characterizing Clusters by What Differentiates Them 477 Using Decision Trees to Describe Clusters 478 Evaluating Clusters 479 Cluster Measurements and Terminology 480 Cluster Silhouettes 480 Limiting Cluster Diameter for Scoring 483 Case Study: Clustering Towns 484 Creating Town Signatures 484 Creating Clusters 486 Determining the Right Number of Clusters 486 Evaluating the Clusters 487 Using Demographic Clusters to Adjust Zone Boundaries 488 Business Success 490 Variations on K-Means 490 K-Medians, K-Medoids, and K-Modes 490 The Soft Side of K-Means 494 Data Preparation for Clustering 495 Scaling for Consistency 496 Use Weights to Encode Outside Information 496 Selecting Variables for Clustering 497 Lessons Learned 497 Chapter 14 Alternative Approaches to Cluster Detection 499 Shortcomings of K-Means 500 Reasonableness 500 An Intuitive Example 501 Fixing the Problem by Changing the Scales 503 What This Means in Practice 504 Gaussian Mixture Models 505 Adding “Gaussians” to K-Means 505 Back to Gaussian Mixture Models 508 Scoring GMMs 510 Applying GMMs 511 Divisive Clustering 513 A Decision Tree–Like Method for Clustering 513 Scoring Divisive Clusters 515 Clusters and Trees 515 Agglomerative (Hierarchical) Clustering 516 Overview of Agglomerative Clustering Methods 516 Clustering People by Age: An Example of An Agglomerative Clustering Algorithm 520 Scoring Agglomerative Clusters 522 Limitations of Agglomerative Clustering 523 Agglomerative Clustering in Practice 525 Combining Agglomerative Clustering and K-Means 526 Self-Organizing Maps 527 What Is a Self-Organizing Map? 527 Training an SOM 530 Scoring an SOM 531 The Search Continues for Islands of Simplicity 532 Lessons Learned 533 Chapter 15 Market Basket Analysis and Association Rules 535 Defining Market Basket Analysis 536 Four Levels of Market Basket Data 537 The Foundation of Market Basket Analysis: Basic Measures 539 Order Characteristics 540 Item (Product) Popularity 541 Tracking Marketing Interventions 542 Case Study: Spanish or English 543 The Business Problem 543 The Data 544 Defining “Hispanicity” Preference 545 The Solution 546 Association Analysis 547 Rules Are Not Always Useful 548 Item Sets to Association Rules 551 How Good Is an Association Rule? 553 Building Association Rules 555 Choosing the Right Set of Items 556 Anonymous Versus Identified 561 Generating Rules from All This Data 561 Overcoming Practical Limits 565 The Problem of Big Data 567 Extending the Ideas 569 Different Items on the Right- and Left-Hand Sides 569 Using Association Rules to Compare Stores 570 Association Rules and Cross-Selling 572 A Typical Cross-Sell Model 572 A More Confident Approach to Product Propensities 573 Results from Using Confidence 574 Sequential Pattern Analysis 574 Finding the Sequences 575 Sequential Association Rules 578 Sequential Analysis Using Other Data Mining Techniques 579 Lessons Learned 579 Chapter 16 Link Analysis 581 Basic Graph Theory 582 What Is a Graph? 582 Directed Graphs 584 Weighted Graphs 585 Seven Bridges of Königsberg 585 Detecting Cycles in a Graph 588 The Traveling Salesman Problem Revisited 589 Social Network Analysis 593 Six Degrees of Separation 593 What Your Friends Say About You 595 Finding Childcare Benefits Fraud 596 Who Responds to Whom on Dating Sites 597 Social Marketing 598 Mining Call Graphs 598 Case Study: Tracking Down the Leader of the Pack 601 The Business Goal 601 The Data Processing Challenge 601 Finding Social Networks in Call Data 602 How the Results Are Used for Marketing 602 Estimating Customer Age 603 Case Study: Who Is Using Fax Machines from Home? 604 Why Finding Fax Machines Is Useful 604 How Do Fax Machines Behave? 604 A Graph Coloring Algorithm 605 “Coloring” the Graph to Identify Fax Machines 606 How Google Came to Rule the World 607 Hubs and Authorities 608 The Details 609 Hubs and Authorities in Practice 611 Lessons Learned 612 Chapter 17 Data Warehousing, OLAP, Analytic Sandboxes, and Data Mining 613 The Architecture of Data 615 Transaction Data, the Base Level 616 Operational Summary Data 617 Decision-Support Summary Data 617 Database Schema/Data Models 618 Metadata 623 Business Rules 623 A General Architecture for Data Warehousing 624 Source Systems 624 Extraction, Transformation, and Load 626 Central Repository 627 Metadata Repository 630 Data Marts 630 Operational Feedback 631 Users and Desktop Tools 631 Analytic Sandboxes 633 Why Are Analytic Sandboxes Needed? 634 Technology to Support Analytic Sandboxes 636 Where Does OLAP Fit In? 639 What’s in a Cube? 641 Star Schema 646 OLAP and Data Mining 648 Where Data Mining Fits in with Data Warehousing 650 Lots of Data 651 Consistent, Clean Data 651 Hypothesis Testing and Measurement 652 Scalable Hardware and RDBMS Support 653 Lessons Learned 653 Chapter 18 Building Customer Signatures 655 Finding Customers in Data 656 What Is a Customer? 657 Accounts? Customers? Households? 658 Anonymous Transactions 658 Transactions Linked to a Card 659 Transactions Linked to a Cookie 659 Transactions Linked to an Account 660 Transactions Linked to a Customer 661 Designing Signatures 661 Is a Customer Signature Necessary? 666 What Does a Row Represent? 666 Will the Signature Be Used for Predictive Modeling? 671 Has a Target Been Defined? 672 Are There Constraints Imposed by the Particular Data Mining Techniques to be Employed? 672 Which Customers Will Be Included? 673 What Might Be Interesting to Know About Customers? 673 What a Signature Looks Like 674 Process for Creating Signatures 677 Some Data Is Already at the Right Level of Granularity 678 Pivoting a Regular Time Series 679 Aggregating Time-Stamped Transactions 680 Dealing with Missing Values 685 Missing Values in Source Data 685 Unknown or Non-Existent? 687 What Not to Do 687 Things to Consider 689 Lessons Learned 691 Chapter 19 Derived Variables: Making the Data Mean More 693 Handset Churn Rate as a Predictor of Churn 694 Single-Variable Transformations 696 Standardizing Numeric Variables 696 Turning Numeric Values into Percentiles 697 Turning Counts into Rates 698 Relative Measures 699 Replacing Categorical Variables with Numeric Ones 700 Combining Variables 707 Classic Combinations 707 Combining Highly Correlated Variables 710 Rent to Home Value 712 Extracting Features from Time Series 718 Trend 719 Seasonality 721 Extracting Features from Geography 722 Geocoding 722 Mapping 723 Using Geography to Create Relative Measures 724 Using Past Values of the Target Variable 725 Using Model Scores as Inputs 725 Handling Sparse Data 726 Account Set Patterns 726 Binning Sparse Values 727 Capturing Customer Behavior from Transactions 727 Widening Narrow Data 728 Sphere of Influence as a Predictor of Good Customers 728 An Example: Ratings to Rater Profile 730 Sample Fields from the Rater Signature 730 The Rating Signature and Derived Variables 732 Lessons Learned 733 Chapter 20 Too Much of a Good Thing? Techniques for Reducing the Number of Variables 735 Problems with Too Many Variables 736 Risk of Correlation Among Input Variables 736 Risk of Overfitting 738 The Sparse Data Problem 738 Visualizing Sparseness 739 Independence 740 Exhaustive Feature Selection 743 Flavors of Variable Reduction Techniques 744 Using the Target 744 Original versus New Variables 744 Sequential Selection of Features 745 The Traditional Forward Selection Methodology 745 Forward Selection Using a Validation Set 747 Stepwise Selection 748 Forward Selection Using Non-Regression Techniques 748 Backward Selection 748 Undirected Forward Selection 749 Other Directed Variable Selection Methods 749 Using Decision Trees to Select Variables 750 Variable Reduction Using Neural Networks 752 Principal Components 753 What Are Principal Components? 753 Principal Components Example 758 Principal Component Analysis 763 Factor Analysis 767 Variable Clustering 768 Example of Variable Clusters 768 Using Variable Clusters 770 Hierarchical Variable Clustering 770 Divisive Variable Clustering 773 Lessons Learned 774 Chapter 21 Listen Carefully to What Your Customers Say: Text Mining 775 What Is Text Mining? 776 Text Mining for Derived Columns 776 Beyond Derived Features 777 Text Analysis Applications 778 Working with Text Data 781 Sources of Text 781 Language Effects 782 Basic Approaches to Representing Documents 783 Representing Documents in Practice 784 Documents and the Corpus 786 Case Study: Ad Hoc Text Mining 786 The Boycott 787 Business as Usual 787 Combining Text Mining and Hypothesis Testing 787 The Results 788 Classifying News Stories Using MBR 789 What Are the Codes? 789 Applying MBR 790 The Results 793 From Text to Numbers 794 Starting with a “Bag of Words” 794 Term-Document Matrix 796 Corpus Effects 797 Singular Value Decomposition (SVD) 798 Text Mining and Naïve Bayesian Models 800 Naïve Bayesian in the Text World 801 Identifying Spam Using Naïve Bayesian 801 Sentiment Analysis 806 DIRECTV: A Case Study in Customer Service 809 Background 809 Applying Text Mining 811 Taking the Technical Approach 814 Not an Iterative Process 818 Continuing to Benefit 818 Lessons Learned 819 Index 821

Autorenportrait

GORDON S. LINOFF and MICHAEL J. A. BERRY are the founders of Data Miners, Inc., a consultancy specializing in data mining. They have jointly authored two of the leading data mining titles in the field, Data Mining Techniques and Mastering Data Mining (both from Wiley). They each have decades of experience applying data mining techniques to business problems in marketing and customer relationship management.

Back cover copy

The newest edition of the leading introductory book on data mining, fully updated and revised Who will remain a loyal customer and who won't? Which messages are most effective with which segments? How can customer value be maximized? This book supplies powerful tools for extracting the answers to these and other crucial business questions from the corporate databases where they lie buried. In the years since the first edition of this book, data mining has grown to become an indispensable tool of modern business. In this latest edition, Linoff and Berry have made extensive updates and revisions to every chapter and added several new ones. The book retains the focus of earlier editions—showing marketing analysts, business managers, and data mining specialists how to harness data mining methods and techniques to solve important business problems. While never sacrificing accuracy for the sake of simplicity, Linoff and Berry present even complex topics in clear, concise English with minimal use of technical jargon or mathematical formulas. Technical topics are illustrated with case studies and practical real-world examples drawn from the authors' experiences, and every chapter contains valuable tips for practitioners. Among the techniques newly covered, or covered in greater depth, are linear and logistic regression models, incremental response (uplift) modeling, naïve Bayesian models, table lookup models, similarity models, radial basis function networks, expectation maximization (EM) clustering, and swarm intelligence. New chapters are devoted to data preparation, derived variables, principal components and other variable reduction techniques, and text mining. After establishing the business context with an overview of data mining applications, and introducing aspects of data mining methodology common to all data mining projects, the book covers each important data mining technique in detail. This third edition of Data Mining Techniques covers such topics as: <ul> <li> How to create stable, long-lasting predictive models </li> <li> Data preparation and variable selection </li> <li> Modeling specific targets with directed techniques such as regression, decision trees, neural networks, and memory based reasoning </li> <li> Finding patterns with undirected techniques such as clustering, association rules, and link analysis </li> <li> Modeling business time-to-event problems such as time to next purchase and expected remaining lifetime </li> <li> Mining unstructured text </li> </ul> The companion website provides data that can be used to test out the various data mining techniques in the book.