Details

Machine Learning

Hands-On for Developers and Technical Professionals
2. Aufl.

von: Jason Bell
34,99 €
Verlag:	Wiley
Format:	EPUB
Veröffentl.:	17.02.2020
ISBN/EAN:	9781119642190
Sprache:	englisch
Anzahl Seiten:	432

In den Warenkorb

Als Gutschein

DRM-geschütztes eBook, Sie benötigen z.B. Adobe Digital Editions und eine Adobe ID zum Lesen.

Beschreibungen

Titelbeschreibung

Dig deep into the data with a hands-on guide to machine learning with updated examples and more! Machine Learning: Hands-On for Developers and Technical Professionals provides hands-on instruction and fully-coded working examples for the most common machine learning techniques used by developers and technical professionals. The book contains a breakdown of each ML variant, explaining how it works and how it is used within certain industries, allowing readers to incorporate the presented techniques into their own work as they follow along. A core tenant of machine learning is a strong focus on data preparation, and a full exploration of the various types of learning algorithms illustrates how the proper tools can help any developer extract information and insights from existing data. The book includes a full complement of Instructor's Materials to facilitate use in the classroom, making this resource useful for students and as a professional reference. At its core, machine learning is a mathematical, algorithm-based technology that forms the basis of historical data mining and modern big data science. Scientific analysis of big data requires a working knowledge of machine learning, which forms predictions based on known properties learned from training data. Machine Learning is an accessible, comprehensive guide for the non-mathematician, providing clear guidance that allows readers to: <ul> <li>Learn the languages of machine learning including Hadoop, Mahout, and Weka</li> <li>Understand decision trees, Bayesian networks, and artificial neural networks</li> <li>Implement Association Rule, Real Time, and Batch learning</li> <li>Develop a strategic plan for safe, effective, and efficient machine learning</li> </ul> By learning to construct a system that can learn from data, readers can increase their utility across industries. Machine learning sits at the core of deep dive data analysis and visualization, which is increasingly in demand as companies discover the goldmine hiding in their existing data. For the tech professional involved in data science, Machine Learning: Hands-On for Developers and Technical Professionals provides the skills and techniques required to dig deeper.

Inhaltsverzeichnis

Introduction xxvii Chapter 1 What is Machine Learning? 1 History of Machine Learning 1 Alan Turing 1 Arthur Samuel 2 Tom M. Mitchell 2 Summary Definition 3 Algorithm Types for Machine Learning 3 Supervised Learning 3 Unsupervised Learning 4 The Human Touch 4 Uses for Machine Learning 4 Software 4 Stock Trading 5 Robotics 6 Medicine and Healthcare 6 Advertising 7 Retail and E-commerce 7 Gaming Analytics 9 The Internet of Things 10 Languages for Machine Learning 10 Python 10 R 11 Matlab 11 Scala 11 Ruby 11 Software Used in This Book 11 Checking the Java Version 12 Weka Toolkit 12 DeepLearning4J 13 Kafka 13 Spark and Hadoop 13 Text Editors and IDEs 13 Data Repositories 14 UC Irvine Machine Learning Repository 14 Kaggle 14 Summary 14 Chapter 2 Planning for Machine Learning 15 The Machine Learning Cycle 15 It All Starts with a Question 16 I Don’t Have Data! 16 Starting Local 17 Transfer Learning 17 Competitions 17 One Solution Fits All? 18 Defining the Process 18 Planning 18 Developing 19 Testing 19 Reporting 19 Refining 19 Production 20 Avoiding Bias 20 Building a Data Team 20 Mathematics and Statistics 20 Programming 21 Graphic Design 21 Domain Knowledge 21 Data Processing 22 Using Your Computer 22 A Cluster of Machines 22 Cloud-Based Services 22 Data Storage 23 Physical Discs 23 Cloud-Based Storage 23 Data Privacy 23 Cultural Norms 24 Generational Expectations 24 The Anonymity of User Data 25 Don’t Cross the “Creepy Line” 25 Data Quality and Cleaning 26 Presence Checks 26 Type Checks 27 Length Checks 27 Range Checks 28 Format Checks 28 The Britney Dilemma 28 What’s in a Country Name? 31 Dates and Times 33 Final Thoughts on Data Cleaning 33 Thinking About Input Data 34 Raw Text 34 Comma-Separated Variables 34 JSON 35 YAML 37 XML 37 Spreadsheets 38 Databases 39 Thinking About Output Data 39 Don’t Be Afraid to Experiment 40 Summary 40 Chapter 3 Data Acquisition Techniques 43 Scraping Data 43 Copy and Paste 44 Google Sheets 46 Using an API 47 Acquiring Weather Data 48 Migrating Data 50 Installing Embulk 51 Using the Quick Run 51 Installing Plugins 52 Migrating Files to Database 53 Bulk Converting CSV to JSON 55 Summary 56 Chapter 4 Statistics, Linear Regression, and Randomness 57 Working with a Basic Dataset 57 Loading and Converting the Dataset 58 Introducing Basic Statistics 59 Minimum and Maximum Values 60 Sum 61 Mean 62 Arithmetic Mean 62 Harmonic Mean 62 Geometric Mean 63 The Relationship Between the Three Averages 63 Mode 65 Median 66 Range 67 Interquartile Ranges 67 Variance 68 Standard Deviation 69 Using Simple Linear Regression 70 Using Your Spreadsheet 70 Writing a Program 73 Embracing Randomness 75 Finding Pi with Random Numbers 76 Using Monte Carlo Pi in Clojure 77 Summary 80 Chapter 5 Working with Decision Trees 81 The Basics of Decision Trees 81 Uses for Decision Trees 81 Advantages of Decision Trees 82 Limitations of Decision Trees 82 Different Algorithm Types 82 How Decision Trees Work 84 Decision Trees in Weka 88 The Requirement 88 Training Data 89 Using Weka to Create a Decision Tree 90 Creating Java Code from the Classification 94 Testing the Classifier Code 99 Thinking About Future Iterations 101 Summary 101 Chapter 6 Clustering 103 What is Clustering? 103 Where is Clustering Used? 104 The Internet 104 Business and Retail 104 Law Enforcement 105 Computing 105 Clustering Models 105 How the K-Means Works 106 Calculating the Number of Clusters in a Dataset 108 K-Means Clustering with Weka 110 Preparing the Data 110 The Workbench Method 111 The Command-Line Method 116 Converting CSV File to ARFF 116 The Coded Method 120 Summary 128 Chapter 7 Association Rules Learning 129 Where is Association Rules Learning Used? 129 Web Usage Mining 130 Beer and Diapers 130 How Association Rules Learning Works 131 Support 133 Confidence 133 Lift 134 Conviction 134 Defining the Process 134 Algorithms 135 Apriori 135 FP-Growth 136 Mining the Baskets—A Walk-Through 136 The Raw Basket Data 136 Using the Weka Application 137 Inspecting the Results 141 Summary 142 Chapter 8 Support Vector Machines 143 What is a Support Vector Machine? 143 Where are Support Vector Machines Used? 144 The Basic Classification Principles 144 Binary and Multiclass Classification 144 Linear Classifiers 146 Confidence 147 Maximizing and Minimizing to Find the Line 147 How Support Vector Machines Approach Classification 148 Using Linear Classification 148 Using Non-Linear Classification 150 Using Support Vector Machines in Weka 151 Installing LibSVM 151 A Classification Walk-Through 152 Implementing LibSVM with Java 158 Summary 164 Chapter 9 Artificial Neural Networks 165 What is a Neural Network? 165 Artificial Neural Network Uses 166 High-Frequency Trading 166 Credit Applications 167 Data Center Management 167 Robotics 167 Medical Monitoring 168 Trusting the Black Box 168 Breaking Down the Artificial Neural Network 169 Perceptrons 169 Activation Functions 170 Multilayer Perceptrons 171 Back Propagation 173 Data Preparation for Artificial Neural Networks 174 Artificial Neural Networks with Weka 175 Generating a Dataset 175 Loading the Data into Weka 177 Configuring the Multilayer Perceptron 178 Training the Network 180 Altering the Network 182 Increasing the Test Data Size 183 Implementing a Neural Network in Java 183 Creating the Project 183 Writing the Code 185 Converting from CSV to Arff 188 Running the Neural Network 188 Developing Neural Networks with DeepLearning4J 189 Modifying the Data 189 Viewing Maven Dependencies 190 Handling the Training Data 191 Normalizing Data 191 Building the Model 192 Evaluating the Model 193 Saving the Model 193 Building and Executing the Program 194 Summary 195 Chapter 10 Machine Learning with Text Documents 197 Preparing Text for Analysis 198 Apache Tika 198 Cleaning the Text Data 203 Stopwords 205 Stemming 206 N-grams 206 TF/IDF 207 Loading the Documents 207 Calculating the Term Frequency 208 Calculating the Inverse Document Frequency 208 Computing the TF/IDF Score 209 Reviewing the Final Code Listing 209 Word2Vec 211 Loading the Raw Text Data 212 Tokenizing the Strings 212 Creating the Model 212 Evaluating the Model 213 Reviewing the Final Code 214 Basic Sentiment Analysis 216 Loading Positive and Negative Words 216 Loading Sentences 217 Calculating the Sentiment Score 217 Reviewing the Final Code 218 Performing a Test Run 220 Further Development 220 Summary 221 Chapter 11 Machine Learning with Images 223 What is an Image? 223 Introducing Color Depth 224 Images in Machine Learning 225 Basic Classifi cation with Neural Networks 226 Basic Settings 226 Loading the MNIST Images 226 Model Configuration 227 Model Training 228 Model Evaluation 228 Convolutional Neural Networks 228 How CNNs Work 228 CNN Demonstration 231 Downloading the Image Data 231 Basic Setup 232 Handling the Training and Test Data 233 Image Preparation 233 CNN Model Configuration 234 Model Training 236 Model Evaluation 236 Saving the Model 237 Transfer Learning 237 Summary 238 Chapter 12 Machine Learning Streaming with Kafka 239 What You Will Learn in This Chapter 239 From Machine Learning to Machine Learning Engineer 240 From Batch Processing to Streaming Data Processing 241 What is Kafka? 241 How Does It Work? 241 Fault Tolerance 243 Further Reading 243 Installing Kafka 243 Kafka as a Single-Node Cluster 244 Kafka as a Multinode Cluster 245 Topics Management 247 Creating Topics 248 Finding Out Information About Existing Topics 248 Deleting Topics 249 Sending Messages from the Command Line 249 Receiving Messages from the Command Line 250 Kafka Tool UI 250 Writing Your Own Producers and Consumers 251 Producers in Java 251 Consumers in Java 255 Building and Running the Applications 258 The Streaming API 260 Building a Streaming Machine Learning System 262 Planning the System 263 Continuous Training 265 Determining Which Models to Use for Predictions 266 Determining Which Algorithms to Use 268 Simple Linear Regression 271 Neural Network 274 Kafka Topics 281 Creating the Topics 281 Kafka Connect 283 Why Persist the Event Data? 283 The REST API Microservice 285 Processing Commands and Events 287 Finding Kafka Brokers 288 A Command or an Event? 289 Making Predictions 293 Prediction Streaming API 293 Prediction Functions 296 Predicting Linear Regression 298 Predicting the Neural Network Model 299 Running the Project 301 Run MySQL 301 Run Zookeeper 301 Run Kafka 301 Create the Topics 301 Run Kafka Connect 301 Model Builds 302 Run Events Streaming Application 302 Run Prediction Streaming Application 302 Start the API 302 Send JSON Training Data 302 Train a Model 302 Make a Prediction 303 Summary 303 Chapter 13 Apache Spark 305 Spark: A Hadoop Replacement? 305 Java, Scala, or Python? 306 Downloading and Installing Spark 306 A Quick Intro to Spark 306 Starting the Shell 307 Data Sources 307 Testing Spark 308 Spark Monitor 309 Comparing Hadoop MapReduce to Spark 310 Writing Stand-Alone Programs with Spark 313 Spark Programs in Java 313 Spark Program Summary 318 Spark SQL 318 Basic Concepts 318 Wrapping Up SparkSQL 323 Spark Streaming 323 Basic Concepts 323 Creating Your First Spark Stream 324 Spark Streams from Kafka 326 MLib: The Machine Learning Library 327 Dependencies 328 Decision Trees 328 Clustering 330 Association Rules with FP-Growth 332 Summary 335 Chapter 14 Machine Learning with R 337 Installing R 337 macOS 337 Windows 338 Linux 338 Your First Run 338 Installing R-Studio 339 The R Basics 340 Variables and Vectors 340 Matrices 341 Lists 342 Data Frames 343 Installing Packages 344 Loading in Data 345 Plotting Data 347 Simple Statistics 350 Simple Linear Regression 350 Creating the Data 351 The Initial Graph 351 Regression with the Linear Model 351 Making a Prediction 352 Basic Sentiment Analysis 353 Using Functions to Load in Word Lists 353 Writing a Function to Score Sentiment 354 Testing the Function 354 Apriori Association Rules 355 Installing the arules Package 355 Gathering the Training Data 356 Importing the Transaction Data 356 Running the Apriori Algorithm 357 Inspecting the Results 358 Accessing R from Java 358 Installing the rJava Package 358 Creating Your First Java Code in R 359 Calling R from Java Programs 359 Setting Up an Eclipse Project 360 Creating the Java/R Class 361 Running the Example 361 Extending Your R Implementations 363 Connecting to Social Media with R 364 Summary 366 Appendix A Kafka Quick Start 367 Installing Kafka 367 Starting Zookeeper 367 Starting Kafka 368 Creating Topics 368 Listing Topics 369 Describing a Topic 369 Deleting Topics 369 Running a Console Producer 370 Running a Console Consumer 370 Appendix B The Twitter API Developer Application Configuration 371 Appendix C Useful Unix Commands 375 Using Sample Data 375 Showing the Contents: cat, more, and less 376 Example Command 376 Expected Output 376 Filtering Content: grep 377 Example Command for Finding Text 377 Example Output 377 Sorting Data: sort 378 Example Command for Basic Sorting 378 Example Output 378 Finding Unique Occurrences: uniq 380 Showing the Top of a File: head 381 Counting Words: wc 381 Locating Anything: find 382 Combining Commands and Redirecting Output 383 Picking a Text Editor 383 Colon Frenzy: Vi and Vim 383 Nano 384 Emacs 384 Appendix D Further Reading 385 Machine Learning 385 Statistics 386 Big Data and Data Science 386 Visualization 387 Making Decisions 387 Datasets 388 Blogs 388 Useful Websites 389 The Tools of the Trade 389 Index 391

Autorenportrait

JASON BELL has worked in software development for over thirty years, now he focuses on large volume data solutions and helping retail and finance customers gain insight from that data with machine learning. He is also an active committee member for several international technology conferences.

Back cover copy

Learn more from your data with this hands-on guide to machine learning If you want to get into machine learning but fear the math, this book is your ultimate guide. Specifically designed for non-mathematicians, this useful guide presents a breakdown of each variant of machine learning, with examples and working code. You'll learn the various algorithms, data preparation techniques, trees, and networks, and get acquainted with the tools that help you get more from your data. You'll understand how it works, where it's used, and how to make it great. <ul> <li>Learn the languages of machine learning: Weka, DeepLearning4J, Spark™, and R</li> <li>Make the right data storage and cleaning decisions, tailored to your desired output</li> <li>Understand decision trees, K-means clustering, artificial neural networks, and association rule learning</li> <li>Implement support vector machines knowing the relevant advantages and limitations</li> <li>Incorporate Big Data processing techniques with Spark and MLLib</li> <li>Use Apache Kafka to capture streaming data and learn in real time</li> <li>Access the tools you need to plan your project and acquire and process data</li> <li>Study examples and use provided working code for hands-on learning</li> </ul>