Python® Machine Learning
Published by
John Wiley & Sons, Inc.
10475 Crosspoint Boulevard
Indianapolis, IN 46256
www.wiley.com
Copyright © 2019 by John Wiley & Sons, Inc., Indianapolis, Indiana
Published simultaneously in Canada
ISBN: 978‐1‐119‐54563‐7
ISBN: 978‐1‐119‐54569‐9 (ebk)
ISBN: 978‐1‐119‐54567‐5 (ebk)
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per‐copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750‐8400, fax (978) 646‐8600. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748‐6011, fax (201) 748‐6008, or online at http://www.wiley.com/go/permissions
.
Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation warranties of fitness for a particular purpose. No warranty may be created or extended by sales or promotional materials. The advice and strategies contained herein may not be suitable for every situation. This work is sold with the understanding that the publisher is not engaged in rendering legal, accounting, or other professional services. If professional assistance is required, the services of a competent professional person should be sought. Neither the publisher nor the author shall be liable for damages arising herefrom. The fact that an organization or Web site is referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or website may provide or recommendations it may make. Further, readers should be aware that Internet websites listed in this work may have changed or disappeared between when this work was written and when it is read.
For general information on our other products and services please contact our Customer Care Department within the United States at (877) 762‐2974, outside the United States at (317) 572‐3993 or fax (317) 572‐4002.
Wiley publishes in a variety of print and electronic formats and by print‐on‐demand. Some material included with standard print versions of this book may not be included in e‐books or in print‐on‐demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com
. For more information about Wiley products, visit www.wiley.com
.
Library of Congress Control Number: 2019931301
Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates, in the United States and other countries, and may not be used without written permission. Python is a registered trademark of Python Software Foundation. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.
I dedicate this book with love to my dearest wife (Sze Wa) and girl (Chloe), who have to endure my irregular work schedule and for their companionship when I am trying to meet writing deadlines!
Wei‐Meng Lee is a technologist and founder of Developer Learning Solutions (http://www.learn2develop.net
), a company specializing in hands‐on training on the latest technologies.
Wei‐Meng has many years of training experience, and his training courses place special emphasis on the learning‐by‐doing approach. His hands‐on approach to learning programming makes understanding the subject much easier than just reading books, tutorials, and documentations.
Wei‐Meng's name regularly appears in online and print publications such as DevX.com
, MobiForge.com
, and CoDe Magazine. You can contact Wei‐Meng at: weimenglee@learn2develop.net
.
Doug Mahugh is a software developer who began his career in 1978 as a Fortran programmer for Boeing. Doug has worked for Microsoft since 2005 in a variety of roles including developer advocacy, standards engagement, and content development. Since learning Python in 2008, Doug has written samples and tutorials on topics ranging from caching and continuous integration to Azure Active Directory authentication and Microsoft Graph. Doug has spoken at industry events in over 20 countries, and he has been Microsoft's technical representative to standards bodies including ISO/IEC, Ecma International, OASIS, CalConnect, and others.
Doug currently lives in Seattle with his wife Megan and two Samoyeds named Jamie and Alice.
Devon Lewis
Jim Minatel
Pete Gaughan
Gary Schwartz
Barath Kumar Rajasekaran
Doug Mahugh
Kim Cofer
Nancy Bell
Potomac Indexing, LLC
Wiley
©Lidiia Moor/iStockphoto‐background texture
© Rick_Jo/iStockphoto‐digital robotic brain
Writing a book is always exciting, but along with it come long hours of hard work, straining to get things done accurately and correctly. To make a book possible, a lot of unsung heroes work tirelessly behind the scenes. For this, I would like to take this opportunity to thank a number of special people who made this book possible.
First, I want to thank my acquisitions editor Devon Lewis, who was my first point of contact for this book. Thank you, Devon, for giving me this opportunity and for your trust in me!
Next, a huge thanks to Gary Schwartz, my project editor, who was always a pleasure to work with. Gary is always contactable, even when he is at the airport! Gary has been very patient with me, even though I have missed several of my deadlines for the book. I know it threw a spanner into his plan, but he is always accommodating. Working with him, I know my book is in good hands. Thank you very much, Gary!
Equally important is my technical editor—Doug Mahugh. Doug has been very eager‐eyed editing and testing my code, and never fails to let me know if things do not work the way I intended. Thanks for catching my errors and making the book a better read, Doug! I would also like to take this opportunity to thank my production editor—Barath Kumar Rajasekaran. Without his hard work, this book would not be even possible. Thanks, Barath!
Last, but not least, I want to thank my parents and my wife, Sze Wa, for all the support they have given me. They have selflessly adjusted their schedules to accommodate my busy schedule when I was working on this book. I love you all!
This book covers machine learning, one of the hottest topics in more recent years. With computing power increasing exponentially and prices decreasing simultaneously, there is no better time for machine learning. With machine learning, tasks that usually require huge processing power are now possible on desktop machines. Nevertheless, machine learning is not for the faint of heart—it requires a good foundation in statistics, as well as programming knowledge. Most books on the market either are too superficial or go into too much depth that often leaves beginning readers gasping for air.
This book will take a gentle approach to this topic. First, it will cover some of the fundamental libraries used in Python that make machine learning possible. In particular, you will learn how to manipulate arrays of numbers using the NumPy library, followed by using the Pandas library to deal with tabular data. Once that is done, you will learn how to visualize data using the matplotlib library, which allows you to plot different types of charts and graphs so that you can visualize your data easily.
Once you have a firm foundation in the basics, I will discuss machine learning using Python and the Scikit‐Learn libraries. This will give you a solid understanding of how the various machine learning algorithms work behind the scenes.
For this book, I will cover the common machine learning algorithms, such as regression, clustering, and classification.
This book also contains a chapter where you will learn how to perform machine learning using the Microsoft Azure Machine Learning Studio, which allows developers to start building machine learning models using drag‐and‐drop without needing to code. And most importantly, without requiring a deep knowledge of machine learning.
Finally, I will discuss how you can deploy the models that you have built, so that they can be used by client applications running on mobile and desktop devices.
It is my key intention to make this book accessible to as many developers as possible. To get the most out of this book, you should have some basic knowledge of Python programming, and some foundational understanding of basic statistics. And just like you will never be able to learn how to swim just by reading a book, I strongly suggest that you try out the sample code while you are going through the chapters. Go ahead and modify the code and see how the output varies, and very often you would be surprised by what you can do.
All the sample code in this book are available as Jupyter Notebooks (available for download from Wiley’s support page for this book, www.wiley.com/go/leepythonmachinelearning
). So you could just download them and try them out immediately.
Without further delay, welcome to Python Machine Learning!