Data Science Programming All-in-One For Dummies^®

Published by: John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030-5774, www.wiley.com

Published simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.

Trademarks: Wiley, For Dummies, the Dummies Man logo, Dummies.com, Making Everything Easier, and related trade dress are trademarks or registered trademarks of John Wiley & Sons, Inc. and may not be used without written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.

LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: THE PUBLISHER AND THE AUTHOR MAKE NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE CONTENTS OF THIS WORK AND SPECIFICALLY DISCLAIM ALL WARRANTIES, INCLUDING WITHOUT LIMITATION WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE. NO WARRANTY MAY BE CREATED OR EXTENDED BY SALES OR PROMOTIONAL MATERIALS. THE ADVICE AND STRATEGIES CONTAINED HEREIN MAY NOT BE SUITABLE FOR EVERY SITUATION. THIS WORK IS SOLD WITH THE UNDERSTANDING THAT THE PUBLISHER IS NOT ENGAGED IN RENDERING LEGAL, ACCOUNTING, OR OTHER PROFESSIONAL SERVICES. IF PROFESSIONAL ASSISTANCE IS REQUIRED, THE SERVICES OF A COMPETENT PROFESSIONAL PERSON SHOULD BE SOUGHT. NEITHER THE PUBLISHER NOR THE AUTHOR SHALL BE LIABLE FOR DAMAGES ARISING HEREFROM. THE FACT THAT AN ORGANIZATION OR WEBSITE IS REFERRED TO IN THIS WORK AS A CITATION AND/OR A POTENTIAL SOURCE OF FURTHER INFORMATION DOES NOT MEAN THAT THE AUTHOR OR THE PUBLISHER ENDORSES THE INFORMATION THE ORGANIZATION OR WEBSITE MAY PROVIDE OR RECOMMENDATIONS IT MAY MAKE. FURTHER, READERS SHOULD BE AWARE THAT INTERNET WEBSITES LISTED IN THIS WORK MAY HAVE CHANGED OR DISAPPEARED BETWEEN WHEN THIS WORK WAS WRITTEN AND WHEN IT IS READ.

For general information on our other products and services, please contact our Customer Care Department within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993, or fax 317-572-4002. For technical support, please visit https://hub.wiley.com/community/support/dummies.

Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.

Library of Congress Control Number: 2019954497

ISBN 978-1-119-62611-4;

ISBN 978-1-119-62613-8 (ebk); ISBN 978-1-119-62614-5 (ebk)

Data Science Programming All-In-One For Dummies®

To view this book's Cheat Sheet, simply go to www.dummies.com and search for “Data Science Programming All-In-One For Dummies Cheat Sheet” in the Search box.

Table of Contents

Cover

Introduction

About This Book
Foolish Assumptions
Icons Used in This Book
Beyond the Book
Where to Go from Here

Book 1: Defining Data Science

Chapter 1: Considering the History and Uses of Data Science
1. Considering the Elements of Data Science
2. Defining the Role of Data in the World
3. Creating the Data Science Pipeline
4. Comparing Different Languages Used for Data Science
5. Learning to Perform Data Science Tasks Fast
Chapter 2: Placing Data Science within the Realm of AI
1. Seeing the Data to Data Science Relationship
2. Defining the Levels of AI
3. Creating a Pipeline from Data to AI
Chapter 3: Creating a Data Science Lab of Your Own
1. Considering the Analysis Platform Options
2. Choosing a Development Language
3. Obtaining and Using Python
4. Obtaining and Using R
5. Presenting Frameworks
6. Accessing the Downloadable Code
Chapter 4: Considering Additional Packages and Libraries You Might Want
1. Considering the Uses for Third-Party Code
2. Obtaining Useful Python Packages
3. Locating Useful R Libraries
Chapter 5: Leveraging a Deep Learning Framework
1. Understanding Deep Learning Framework Usage
2. Working with Low-End Frameworks
3. Understanding TensorFlow

Book 2: Interacting with Data Storage

Chapter 1: Manipulating Raw Data
1. Defining the Data Sources
2. Considering the Data Forms
3. Understanding the Need for Data Reliability
Chapter 2: Using Functional Programming Techniques
1. Defining Functional Programming
2. Understanding Pure and Impure Languages
3. Comparing the Functional Paradigm
4. Using Python for Functional Programming Needs
5. Understanding How Functional Data Works
6. Working with Lists and Strings
7. Employing Pattern Matching
8. Working with Recursion
9. Performing Functional Data Manipulation
Chapter 3: Working with Scalars, Vectors, and Matrices
1. Considering the Data Forms
2. Defining Data Type through Scalars
3. Creating Organized Data with Vectors
4. Creating and Using Matrices
5. Extending Analysis to Tensors
6. Using Vectorization Effectively
7. Selecting and Shaping Data
8. Working with Trees
9. Representing Relations in a Graph
Chapter 4: Accessing Data in Files
1. Understanding Flat File Data Sources
2. Working with Positional Data Files
3. Accessing Data in CSV Files
4. Moving On to XML Files
5. Considering Other Flat-File Data Sources
6. Working with Nontext Data
7. Downloading Online Datasets
Chapter 5: Working with a Relational DBMS
1. Considering RDBMS Issues
2. Accessing the RDBMS Data
3. Creating a Dataset
4. Mixing RDBMS Products
Chapter 6: Working with a NoSQL DMBS
1. Considering the Ramifications of Hierarchical Data
2. Accessing the Data
3. Interacting with Data from NoSQL Databases
4. Working with Dictionaries
5. Developing Datasets from Hierarchical Data
6. Processing Hierarchical Data into Other Forms

Book 3: Manipulating Data Using Basic Algorithms

Chapter 1: Working with Linear Regression
1. Considering the History of Linear Regression
2. Combining Variables
3. Manipulating Categorical Variables
4. Using Linear Regression to Guess Numbers
5. Learning One Example at a Time
Chapter 2: Moving Forward with Logistic Regression
1. Considering the History of Logistic Regression
2. Differentiating between Linear and Logistic Regression
3. Using Logistic Regression to Guess Classes
4. Switching to Probabilities
5. Working through Multiclass Regression
Chapter 3: Predicting Outcomes Using Bayes
1. Understanding Bayes' Theorem
2. Using Naïve Bayes for Predictions
3. Working with Networked Bayes
4. Considering the Use of Bayesian Linear Regression
5. Considering the Use of Bayesian Logistic Regression
Chapter 4: Learning with K-Nearest Neighbors
1. Considering the History of K-Nearest Neighbors
2. Learning Lazily with K-Nearest Neighbors
3. Leveraging the Correct k Parameter
4. Implementing KNN Regression
5. Implementing KNN Classification

Book 4: Performing Advanced Data Manipulation

Chapter 1: Leveraging Ensembles of Learners
1. Leveraging Decision Trees
2. Working with Almost Random Guesses
3. Meeting Again with Gradient Descent
4. Averaging Different Predictors
Chapter 2: Building Deep Learning Models
1. Discovering the Incredible Perceptron
2. Hitting Complexity with Neural Networks
3. Understanding More about Neural Networks
4. Looking Under the Hood of Neural Networks
5. Explaining Deep Learning Differences with Other Forms of AI
Chapter 3: Recognizing Images with CNNs
1. Beginning with Simple Image Recognition
2. Understanding CNN Image Basics
3. Moving to CNNs with Character Recognition
4. Explaining How Convolutions Work
5. Detecting Edges and Shapes from Images
Chapter 4: Processing Text and Other Sequences
1. Introducing Natural Language Processing
2. Understanding How Machines Read
3. Understanding Semantics Using Word Embeddings
4. Using Scoring and Classification

Book 5: Performing Data-Related Tasks

Chapter 1: Making Recommendations
1. Realizing the Recommendation Revolution
2. Downloading Rating Data
3. Leveraging SVD
Chapter 2: Performing Complex Classifications
1. Using Image Classification Challenges
2. Distinguishing Traffic Signs
Chapter 3: Identifying Objects
1. Distinguishing Classification Tasks
2. Perceiving Objects in Their Surroundings
3. Overcoming Adversarial Attacks on Deep Learning Applications
Chapter 4: Analyzing Music and Video
1. Learning to Imitate Art and Life
2. Mimicking an Artist
3. Moving toward GANs
Chapter 5: Considering Other Task Types
1. Processing Language in Texts
2. Processing Time Series
Chapter 6: Developing Impressive Charts and Plots
1. Starting a Graph, Chart, or Plot
2. Setting the Axis, Ticks, and Grids
3. Defining the Line Appearance
4. Using Labels, Annotations, and Legends
5. Creating Scatterplots
6. Plotting Time Series
7. Plotting Geographical Data
8. Visualizing Graphs

Book 6: Diagnosing and Fixing Errors

Chapter 1: Locating Errors in Your Data
1. Considering the Types of Data Errors
2. Obtaining the Required Data
3. Validating Your Data
4. Manicuring the Data
5. Dealing with Dates in Your Data
Chapter 2: Considering Outrageous Outcomes
1. Deciding What Outrageous Means
2. Considering the Five Mistruths in Data
3. Considering Detection of Outliers
4. Examining a Simple Univariate Method
5. Developing a Multivariate Approach
Chapter 3: Dealing with Model Overfitting and Underfitting
1. Understanding the Causes
2. Determining the Sources of Overfitting and Underfitting
3. Guessing the Right Features
Chapter 4: Obtaining the Correct Output Presentation
1. Considering the Meaning of Correct
2. Determining a Presentation Type
3. Choosing the Right Graph
4. Working with External Data
Chapter 5: Developing Consistent Strategies
1. Standardizing Data Collection Techniques
2. Using Reliable Sources
3. Verifying Dynamic Data Sources
4. Looking for New Data Collection Trends
5. Weeding Old Data
6. Considering the Need for Randomness

Index

About the Authors

Connect with Dummies

End User License Agreement

List of Illustrations

Book 1 Chapter 1

FIGURE 1-1: Loading data into variables so that you can manipulate it.
FIGURE 1-2: Using the variable content to train a linear regression model.
FIGURE 1-3: Outputting a result as a response to the model.

Book 1 Chapter 2

FIGURE 2-1: Deep learning is a subset of machine learning which is a subset of ...

Book 1 Chapter 3

FIGURE 3-1: The setup process begins by telling you whether you have the 64-bit...
FIGURE 3-2: Tell the wizard how to install Anaconda on your system.
FIGURE 3-3: Specify an installation location.
FIGURE 3-4: Configure the advanced installation options.
FIGURE 3-5: Create a folder to use to hold the book’s code.
FIGURE 3-6: Provide a new name for your notebook.
FIGURE 3-7: A notebook contains cells that you use to hold code.
FIGURE 3-8: Your saved notebooks appear in a list in the project folder.
FIGURE 3-9: Notebook warns you before removing any files from the repository.
FIGURE 3-10: The files you want to add to the repository appear as part of an u...
FIGURE 3-11: Colab makes using your Python projects on a tablet easy.
FIGURE 3-12: Azure Notebooks provides another means of running Python code.
FIGURE 3-13: Open an Anaconda Prompt to install R.
FIGURE 3-14: The conda utility tells you which packages it will install.
FIGURE 3-15: Anaconda Navigator provides access to a number of useful tools.
FIGURE 3-16: Changing your environment will often change the available tool lis...
FIGURE 3-17: You can save R code in .r files, but the .r files lack Notebook co...

Book 1 Chapter 5

FIGURE 5-1: Be sure to use the Anaconda prompt for the installation and check t...
FIGURE 5-2: Choose the Visual C++ Build Tools workload to support your Python s...
FIGURE 5-3: Select an environment to use in Anaconda Navigator.

Book 2 Chapter 3

FIGURE 3-1: A tree in Python looks much like the physical alternative.
FIGURE 3-2: Graph nodes can connect to each other in myriad ways.

Book 2 Chapter 4

FIGURE 4-1: A text file contains only text and a little formatting with control...
FIGURE 4-2: Each field in this file consumes precisely the same space.
FIGURE 4-3: This file includes carriage returns for row indicators.
FIGURE 4-4: The raw format of a CSV file is still text and quite readable.
FIGURE 4-5: Use an application such as Excel to create a formatted CSV presenta...
FIGURE 4-6: CSV headers can contain data type information, among other clues.
FIGURE 4-7: XML is a hierarchical format that can become quite complex.
FIGURE 4-8: An Excel file is highly formatted and might contain information of ...
FIGURE 4-9: The image appears onscreen after you render and show it.
FIGURE 4-10: Cropping the image makes it smaller.

Book 2 Chapter 6

FIGURE 6-1: A hierarchical construction relies on links to each item.
FIGURE 6-2: The arrangement of keys when using a BST.
FIGURE 6-3: The arrangement of keys when using a binary heap.
FIGURE 6-4: An example graph that you can use for certain types of data storage...

Book 3 Chapter 1

FIGURE 1-1: Drawing a linear regression line through the data points.
FIGURE 1-2: Developing a multiple regression model.
FIGURE 1-3: Changing the simple linear regression question.
FIGURE 1-4: Seeing the effect of i on y.
FIGURE 1-5: Using a residual plot to see errant data.
FIGURE 1-6: Nonlinear relationship between variable LSTAT and target prices.
FIGURE 1-7: Combined variables LSTAT and RM help to separate high from low pric...
FIGURE 1-8: Adding polynomial features increases the predictive power.
FIGURE 1-9: A slow descent optimizing squared error.

Book 3 Chapter 2

FIGURE 2-1: Contrasting linear to logistic regression.
FIGURE 2-2: Considering the approach to fitting the data.
FIGURE 2-3: Contrasting linear to logistic regression.
FIGURE 2-4: Probabilities do not work as well with a straight line as they do w...
FIGURE 2-5: The plot shows the result of a multiclass regression among three cl...

Book 3 Chapter 3

FIGURE 3-1: Seeing the probabilities for each of the colors.
FIGURE 3-2: Determining how many cars to paint specific colors.
FIGURE 3-3: The interactive version of the Asia Bayesian network is helpful in ...
FIGURE 3-4: A Naïve Bayes model can retrace evidence to the right outcome.
FIGURE 3-5: A visualization of the decision tree built from the play-tennis dat...

Book 3 Chapter 4

FIGURE 4-1: The bull’s-eye dataset, a nonlinear cloud of points that is difficu...
FIGURE 4-2: The KNN approach models the data differently than multiple linear r...

Book 4 Chapter 1

FIGURE 1-1: Comparing a single decision tree output to an ensemble of decision ...
FIGURE 1-2: Seeing the accuracy of ensembles of different sizes.
FIGURE 1-3: Installing the rfpimp package in Python.

Book 4 Chapter 2

FIGURE 2-1: The separating line of a perceptron across two classes.
FIGURE 2-2: Learning logical XOR using a single separating line isn’t possible.
FIGURE 2-3: Plots of different activation functions.
FIGURE 2-4: An example of the architecture of a neural network.
FIGURE 2-5: A detail of the feed-forward process in a neural network.
FIGURE 2-6: Two interleaving moon-shaped clouds of data points.
FIGURE 2-7: How the ReLU activation function works in receiving and releasing s...
FIGURE 2-8: Dropout temporarily rules out 40 percent of neurons from the traini...

Book 4 Chapter 3

FIGURE 3-1: The image appears onscreen after you render and show it.
FIGURE 3-2: Different filters for different noise cleaning.
FIGURE 3-3: Cropping the image makes it smaller.
FIGURE 3-4: The example application would like to find similar photos.
FIGURE 3-5: The output shows the results that resemble the test image.
FIGURE 3-6: Examples from the training and test sets differ in pose and express...
FIGURE 3-7: Each pixel is read by the computer as a number in a matrix.
FIGURE 3-8: Only by translation invariance can an algorithm spot the dog and it...
FIGURE 3-9: Displaying some of the handwritten characters from MNIST.
FIGURE 3-10: A convolution processes a chunk of an image by matrix multiplicati...
FIGURE 3-11: The borders of an image are detected after applying a 3-x-3 pixel ...
FIGURE 3-12: A max pooling layer operating on chunks of a reduced image.
FIGURE 3-13: The architecture of LeNet5, a neural network for handwritten digit...
FIGURE 3-14: A plot of the LeNet5 network training process.
FIGURE 3-15: Processing a dog image using convolutions.
FIGURE 3-16: The content of an image is transformed by style transfer.

Book 5 Chapter 2

FIGURE 2-1: Some common image augmentations.
FIGURE 2-2: Some examples from the German Traffic Sign Recognition Benchmark.
FIGURE 2-3: Distribution of classes.
FIGURE 2-4: Training and validation errors compared.

Book 5 Chapter 3

FIGURE 3-1: Detection, localization, and segmentation example from the Coco dat...
FIGURE 3-2: Object detection resulting from Keras-RetinaNet.

Book 5 Chapter 4

FIGURE 4-1: A human might see a fanciful drawing.
FIGURE 4-2: The computer sees a series of numbers.
FIGURE 4-3: How a GAN operates.

Book 5 Chapter 5

FIGURE 5-1: Working with cyclic data that varies over time.

Book 5 Chapter 6

FIGURE 6-1: The output of a plain line graph.
FIGURE 6-2: The output of multiple datasets in a single line graph.
FIGURE 6-3: The output of multiple presentations in a single figure.
FIGURE 6-4: Allowing multiple revisions to a single output graphic.
FIGURE 6-5: The original figure changes as needed.
FIGURE 6-6: Modifying the plot ticks.
FIGURE 6-7: Adding grid lines to make data easier to read.
FIGURE 6-8: Making changes to a line as part of the plot or separately.
FIGURE 6-9: Adding markers to emphasize the data points.
FIGURE 6-10: Labels identify specific graphic elements.
FIGURE 6-11: Annotation provides the means of pointing something out.
FIGURE 6-12: Legends identify the individual grouped data elements.
FIGURE 6-13: Some plots really don’t say anything at all.
FIGURE 6-14: Differentiation makes the plots easier to interpret.
FIGURE 6-15: A scatterplot showing a high degree of negative correlation.
FIGURE 6-16: A scatterplot showing a high degree of positive correlation.
FIGURE 6-17: Using a general plot to display date-oriented data.
FIGURE 6-18: Using plot_date() to display date-oriented data.
FIGURE 6-19: The results of calculating a trend line for the airline passenger ...
FIGURE 6-20: An orthographic projection of the world.
FIGURE 6-21: Your maps can look quite realistic.
FIGURE 6-22: Some projections allow for a close look.
FIGURE 6-23: Adding locations or other information to the map.
FIGURE 6-24: Plotting the original graph.
FIGURE 6-25: Plotting the graph addition.

Book 6 Chapter 2

FIGURE 2-1: Descriptive statistics for a DataFrame.
FIGURE 2-2: Boxplots.
FIGURE 2-3: Reporting possibly outlying examples.
FIGURE 2-4: The first two and last two components from the PCA.
FIGURE 2-5: The possible outlying cases spotted by PCA.

Book 6 Chapter 3

FIGURE 3-1: Underfitting is the result of using a model that isn't complex enou...
FIGURE 3-2: Overfitting causes the model to follow the data too closely.
FIGURE 3-3: Applying the model to slightly different data shows the problem wit...
FIGURE 3-4: Using the correct degrees of polynomial fitting makes a big differe...
FIGURE 3-5: Nonlinear relationship between variable LSTAT and target prices.
FIGURE 3-6: Combined variables LSTAT and RM help to separate high from low pric...
FIGURE 3-7: Adding polynomial features increases the predictive power.

Book 6 Chapter 4

FIGURE 4-1: Pie charts show a percentage of the whole.
FIGURE 4-2: Bar charts make performing comparisons easier.
FIGURE 4-3: Histograms let you see distributions of numbers.
FIGURE 4-4: Use boxplots to present groups of numbers.
FIGURE 4-5: Use line graphs to show trends.
FIGURE 4-6: Use scatterplots to show groups of data points and their associated...
FIGURE 4-7: Load external code as needed to provide specific information for yo...
FIGURE 4-8: Embedding images can dress up your notebook presentation.

Introduction

Data science is a term that the media has chosen to minimize, obfuscate, and sometimes misuse. It involves a lot more than just data and the science of working with data. Today, the world uses data science in all sorts of ways that you might not know about, which is why you need Data Science Programming All-in-One For Dummies.

In the book, you start with both the data and the science of manipulating it, but then you go much further. In addition to seeing how to perform a wide range of analysis, you also delve into making recommendations, classifying real-world objects, analyzing audio, and even creating art.

However, you don’t just learn about amazing new technologies and how to perform common tasks. This book also dispels myths created by people who wish data science were something different than it really is or who don’t understand it at all. A great deal of misinformation swirls around the world today as the media seeks to sensationalize, anthropomorphize, and emotionalize technologies that are, in fact, quite mundane. It’s hard to know what to believe. You find reports that robots are on the cusp of becoming sentient and that the giant tech companies can discover your innermost thoughts simply by reviewing your record of purchases. With this book, you can replace disinformation with solid facts, and you can use those facts to create a strategy for performing data science development tasks.

About This Book

You might find that this book starts off a little slowly because most people don’t have a good grasp on getting a system prepared for data science use. Book 1 helps you configure your system. The book uses Jupyter Notebook as an Integrated Development Environment (IDE) for both Python and R. That way, if you choose to view the examples in both languages, you use the same IDE to do it. Jupyter Notebook also relies on the literate programming strategy first proposed by Donald Knuth (see http://www.literateprogramming.com/) to make your coding efforts significantly easier and more focused on the data. In addition, in contrast to other environments, you don’t actually write entire applications before you see something; you write code and focus on the results of just that code block as part of a whole application.

After you have a development environment installed and ready to use, you can start working with data in all its myriad forms in Book 2. This book covers a great many of these forms — everything from in-memory datasets to those found on large websites. In addition, you see a number of data formats ranging from flat files to Relational Database Management Systems (RDBMSs) and Not Only SQL (NoSQL) databases.

Of course, manipulating data is worthwhile only if you can do something useful with it. Book 3 discusses common sorts of analysis, such as linear and logistic regression, Bayes’ Theorem, and K-Nearest Neighbors (KNN).

Most data science books stop at this point. In this book, however, you discover AI, machine learning, and deep learning techniques to get more out of your data than you might have thought possible. This exciting part of the book, Book 4, represents the cutting edge of analysis. You use huge datasets to discover important information about large groups of people that will help you improve their health or sell them products.

Performing analysis may be interesting, but analysis is only a step along the path. Book 5 shows you how to put your analysis to use in recommender systems, to classify objects, work with nontextual data like music and video, and display the results of an analysis in a form that everyone can appreciate.

The final minibook, Book 6, offers something you won’t find in many places, not even online. You discover how to detect and fix problems with your data, the logic used to interpret the data, and the code used to perform tasks such as analysis. By the time you complete Book 6, you’ll know much more about how to ensure that the results you get are actually the results you need and want.

To make absorbing the concepts easy, this book uses the following conventions:

Text that you’re meant to type just as it appears in the book is in bold. The exception is when you’re working through a step list: Because each step is bold, the text to type is not bold.
When you see words in italics as part of a typing sequence, you need to replace that value with something that works for you. For example, if you see “Type Your Name and press Enter,” you need to replace Your Name with your actual name.
Web addresses and programming code appear in monofont. If you're reading a digital version of this book on a device connected to the Internet, you can click or tap the web address to visit that website, like this: https://www.dummies.com.
When you need to type command sequences, you see them separated by a special arrow, like this: File ⇒ New File. In this example, you go to the File menu first and then select the New File entry on that menu.

Foolish Assumptions

You might find it difficult to believe that we’ve assumed anything about you — after all; we haven’t even met you yet! Although most assumptions are indeed foolish, we made these assumptions to provide a starting point for the book.

You need to be familiar with the platform you want to use because the book doesn’t offer any guidance in this regard. (Book 1, Chapter 3 does, however, provide Anaconda installation instructions for both Python and R, and Book 1, Chapter 5 helps you install the TensorFlow and Keras frameworks used for this book.) To give you the maximum information about Python concerning how it applies to deep learning, this book doesn’t discuss any platform-specific issues. You see the R version of the Python coding examples in the downloadable source, along with R-specific notes on usage and development. You really do need to know how to install applications, use applications, and generally work with your chosen platform before you begin working with this book.

You must know how to work with Python or R. You can find a wealth of Python tutorials online (see https://www.w3schools.com/python/ and https://www.tutorialspoint.com/python/ as examples). R, likewise, provides a wealth of online tutorials (see https://www.tutorialspoint.com/r/index.htm, https://docs.anaconda.com/anaconda/navigator/tutorials/r-lang/, and https://www.statmethods.net/r-tutorial/index.html as examples).

This book isn’t a math primer. Yes, you see many examples of complex math, but the emphasis is on helping you use Python or R to perform data science development tasks rather than teaching math theory. We include some examples that also discuss the use of technologies such as data management (see Book 2), statistical analysis (see Book 3), AI, machine learning, deep learning (see Book 4), practical data science application (see Book 5), and troubleshooting both data and code (see Book 6). Book 1, Chapters 1 and 2 give you a better understanding of precisely what you need to know to use this book successfully. You also use a considerable number of libraries in writing code for this book. Book 1, Chapter 4 discusses library use and suggests other libraries that you might want to try.

This book also assumes that you can access items on the Internet. Sprinkled throughout are numerous references to online material that will enhance your learning experience. However, these added sources are useful only if you actually find and use them.

Icons Used in This Book

As you read this book, you see icons in the margins that indicate material of interest (or not, as the case may be). This section briefly describes each icon in this book.

Tips are nice because they help you save time or perform some task without a lot of extra work. The tips in this book are time-saving techniques or pointers to resources that you should try so that you can get the maximum benefit from Python or R, or from performing deep learning–related tasks. (Note that R developers will also find copious notes in the source code files for issues that differ significantly from Python.)

We don’t want to sound like angry parents or some kind of maniacs, but you should avoid doing anything that’s marked with a Warning icon. Otherwise, you might find that your application fails to work as expected, you get incorrect answers from seemingly bulletproof algorithms, or (in the worst-case scenario) you lose data.

Whenever you see this icon, think advanced tip or technique. You might find these tidbits of useful information just too boring for words, or they could contain the solution you need to get a program running. Skip these bits of information whenever you like.

If you don’t get anything else out of a particular chapter or section, remember the material marked by this icon. This text usually contains an essential process or a bit of information that you must know to work with Python or R, or to perform deep learning–related tasks successfully. (Note that the R source code files contain a great deal of text that gives essential details for working with R when R differs considerably from Python.)

Beyond the Book

This book isn’t the end of your Python or R data science development experience — it’s really just the beginning. We provide online content to make this book more flexible and better able to meet your needs. That way, as we receive email from you, we can address questions and tell you how updates to Python, R, or their associated add-ons affect book content. In fact, you gain access to all these cool additions:

Cheat sheet: You remember using crib notes in school to make a better mark on a test, don’t you? You do? Well, a cheat sheet is sort of like that. It provides you with some special notes about tasks that you can do with Python and R with regard to data science development that not every other person knows. You can find the cheat sheet by going to www.dummies.com, searching this book's title, and scrolling down the page that appears. The cheat sheet contains really neat information, such as the most common data errors that cause people problems with working in the data science field.
Updates: Sometimes changes happen. For example, we might not have seen an upcoming change when we looked into our crystal ball during the writing of this book. In the past, this possibility simply meant that the book became outdated and less useful, but you can now find updates to the book, if we have any, by searching this book's title at www.dummies.com.

In addition to these updates, check out the blog posts with answers to reader questions and demonstrations of useful, book-related techniques at http://blog.johnmuellerbooks.com/.
Companion files: Hey! Who really wants to type all the code in the book and reconstruct all those neural networks manually? Most readers prefer to spend their time actually working with data and seeing the interesting things they can do, rather than typing. Fortunately for you, the examples used in the book are available for download, so all you need to do is read the book to learn Python or R data science programming techniques. You can find these files at www.dummies.com. Search this book's title, and on the page that appears, scroll down to the image of the book cover and click it. Then click the More about This Book button and on the page that opens, go to the Downloads tab.

Where to Go from Here

It’s time to start your Python or R for data science programming adventure! If you’re completely new to Python or R and its use for data science tasks, you should start with Book 1, Chapter 1. Progressing through the book at a pace that allows you to absorb as much of the material as possible makes it feasible for you to gain insights that you might not otherwise gain if you read the chapters in a random order. However, the book is designed to allow you to read the material in any order desired.

If you’re a novice who’s in an absolute rush to get going with Python or R for data science programming as quickly as possible, you can skip to Book 1, Chapter 3 with the understanding that you may find some topics a bit confusing later. Skipping to Book 1, Chapter 5 is okay if you already have Anaconda (the programming product used in the book) installed with the appropriate language (Python or R as you desire), but be sure to at least skim Chapter 3 so that you know what assumptions we made when writing this book.

This book relies on a combination of TensorFlow and Keras to perform deep learning tasks. Even if you’re an advanced reader who wants to perform deep learning tasks, you need to go to Book 1, Chapter 5 to discover how to configure the environment used for this book. You must configure the environment according to instructions or you’re likely to experience failures when you try to run the code. However, this issue applies only to deep learning. This book has a great deal to offer in other areas, such as data manipulation and statistical analysis.

Book 1

Defining Data Science

Contents at a Glance

Chapter 1: Considering the History and Uses of Data Science
1. Considering the Elements of Data Science
2. Defining the Role of Data in the World
3. Creating the Data Science Pipeline
4. Comparing Different Languages Used for Data Science
5. Learning to Perform Data Science Tasks Fast
Chapter 2: Placing Data Science within the Realm of AI
1. Seeing the Data to Data Science Relationship
2. Defining the Levels of AI
3. Creating a Pipeline from Data to AI
Chapter 3: Creating a Data Science Lab of Your Own
1. Considering the Analysis Platform Options
2. Choosing a Development Language
3. Obtaining and Using Python
4. Obtaining and Using R
5. Presenting Frameworks
6. Accessing the Downloadable Code
Chapter 4: Considering Additional Packages and Libraries You Might Want
1. Considering the Uses for Third-Party Code
2. Obtaining Useful Python Packages
3. Locating Useful R Libraries
Chapter 5: Leveraging a Deep Learning Framework
1. Understanding Deep Learning Framework Usage
2. Working with Low-End Frameworks
3. Understanding TensorFlow