Details

The Data Science Handbook


The Data Science Handbook


1. Aufl.

von: Field Cady

44,99 €

Verlag: Wiley
Format: PDF
Veröffentl.: 20.01.2017
ISBN/EAN: 9781119092933
Sprache: englisch
Anzahl Seiten: 416

DRM-geschütztes eBook, Sie benötigen z.B. Adobe Digital Editions und eine Adobe ID zum Lesen.

Beschreibungen

<p><b>A comprehensive overview of data science covering the analytics, programming, and business skills necessary to master the discipline</b></p> <p>Finding a good data scientist has been likened to hunting for a unicorn: the required combination of technical skills is simply very hard to find in one person. In addition, good data science is not just rote application of trainable skill sets; it requires the ability to think flexibly about all these areas and understand the connections between them. This book provides a crash course in data science, combining all the necessary skills into a unified discipline.</p> <p>Unlike many analytics books, computer science and software engineering are given extensive coverage since they play such a central role in the daily work of a data scientist. The author also describes classic machine learning algorithms, from their mathematical foundations to real-world applications. Visualization tools are reviewed, and their central importance in data science is highlighted. Classical statistics is addressed to help readers think critically about the interpretation of data and its common pitfalls. The clear communication of technical results, which is perhaps the most undertrained of data science skills, is given its own chapter, and all topics are explained in the context of solving real-world data problems. The book also features:</p> <p>• Extensive sample code and tutorials using Python™ along with its technical libraries</p> <p>• Core technologies of “Big Data,” including their strengths and limitations and how they can be used to solve real-world problems</p> <p>• Coverage of the practical realities of the tools, keeping theory to a minimum; however, when theory is presented, it is done in an intuitive way to encourage critical thinking and creativity</p> <p>• A wide variety of case studies from industry</p> <p>• Practical advice on the realities of being a data scientist today, including the overall workflow, where time is spent, the types of datasets worked on, and the skill sets needed</p> <p><i>The Data Science Handbook </i>is an ideal resource for data analysis methodology and big data software tools. The book is appropriate for people who want to practice data science, but lack the required skill sets. This includes software professionals who need to better understand analytics and statisticians who need to understand software. Modern data science is a unified discipline, and it is presented as such. This book is also an appropriate reference for researchers and entry-level graduate students who need to learn real-world analytics and expand their skill set.</p> <p><b>FIELD CADY </b>is the data scientist at the Allen Institute for Artificial Intelligence, where he develops tools that use machine learning to mine scientific literature. He has also worked at Google and several Big Data startups. He has a BS in physics and math from Stanford University, and an MS in computer science from Carnegie Mellon.</p>
<p>Preface xvii</p> <p><b>1 Introduction: Becoming a Unicorn 1</b></p> <p>1.1 Aren’t Data Scientists Just Overpaid Statisticians? 2</p> <p>1.2 How is This Book Organized? 3</p> <p>1.3 How to Use This Book? 3</p> <p>1.4 Why is It All in Python<sup>™</sup>, Anyway? 4</p> <p>1.5 Example Code and Datasets 4</p> <p>1.6 Parting Words 5</p> <p><b>Part I The Stuff You’ll Always Use 7</b></p> <p><b>2 The Data Science Road Map 9</b></p> <p>2.1 Frame the Problem 10</p> <p>2.2 Understand the Data: Basic Questions 11</p> <p>2.3 Understand the Data: Data Wrangling 12</p> <p>2.4 Understand the Data: Exploratory Analysis 13</p> <p>2.5 Extract Features 14</p> <p>2.6 Model 15</p> <p>2.7 Present Results 15</p> <p>2.8 Deploy Code 16</p> <p>2.9 Iterating 16</p> <p>2.10 Glossary 17</p> <p><b>3 Programming Languages 19</b></p> <p>3.1 Why Use a Programming Language? What are the Other Options? 19</p> <p>3.2 A Survey of Programming Languages for Data Science 20</p> <p>3.2.1 Python 20</p> <p>3.2.2 R 21</p> <p>3.2.3 MATLAB<sup>® </sup>and Octave 21</p> <p>3.2.4 SAS<sup>® </sup>21</p> <p>3.2.5 Scala<sup>® </sup>22</p> <p>3.3 Python Crash Course 22</p> <p>3.3.1 A Note on Versions 22</p> <p>3.3.2 “Hello World” Script 23</p> <p>3.3.3 More Complicated Script 23</p> <p>3.3.4 Atomic Data Types 26</p> <p>3.4 Strings 27</p> <p>3.4.1 Comments and Docstrings 28</p> <p>3.4.2 Complex Data Types 29</p> <p>3.4.3 Lists 29</p> <p>3.4.4 Strings and Lists 30</p> <p>3.4.5 Tuples 31</p> <p>3.4.6 Dictionaries 31</p> <p>3.4.7 Sets 32</p> <p>3.5 Defining Functions 32</p> <p>3.5.1 For Loops and Control Structures 33</p> <p>3.5.2 A Few Key Functions 34</p> <p>3.5.3 Exception Handling 35</p> <p>3.5.4 Libraries 35</p> <p>3.5.5 Classes and Objects 35</p> <p>3.5.6 GOTCHA: Hashable and Unhashable Types 36</p> <p>3.6 Python’s Technical Libraries 37</p> <p>3.6.1 Data Frames 38</p> <p>3.6.2 Series 39</p> <p>3.6.3 Joining and Grouping 40</p> <p>3.7 Other Python Resources 42</p> <p>3.8 Further Reading 42</p> <p>3.9 Glossary 43</p> <p>3a Interlude: My Personal Toolkit 45</p> <p><b>4 Data Munging: String Manipulation, Regular Expressions, and Data Cleaning 47</b></p> <p>4.1 The Worst Dataset in the World 48</p> <p>4.2 How to Identify Pathologies 48</p> <p>4.3 Problems with Data Content 49</p> <p>4.3.1 Duplicate Entries 49</p> <p>4.3.2 Multiple Entries for a Single Entity 49</p> <p>4.3.3 Missing Entries 49</p> <p>4.3.4 NULLs 50</p> <p>4.3.5 Huge Outliers 50</p> <p>4.3.6 Out‐of‐Date Data 50</p> <p>4.3.7 Artificial Entries 50</p> <p>4.3.8 Irregular Spacings 51</p> <p>4.4 Formatting Issues 51</p> <p>4.4.1 Formatting is Irregular between Different Tables/Columns 51</p> <p>4.4.2 Extra Whitespace 51</p> <p>4.4.3 Irregular Capitalization 52</p> <p>4.4.4 Inconsistent Delimiters 52</p> <p>4.4.5 Irregular NULL Format 52</p> <p>4.4.6 Invalid Characters 52</p> <p>4.4.7 Weird or Incompatible Datetimes 52</p> <p>4.4.8 Operating System Incompatibilities 53</p> <p>4.4.9 Wrong Software Versions 53</p> <p>4.5 Example Formatting Script 54</p> <p>4.6 Regular Expressions 55</p> <p>4.6.1 Regular Expression Syntax 56</p> <p>4.7 Life in the Trenches 60</p> <p>4.8 Glossary 60</p> <p><b>5 Visualizations and Simple Metrics 61</b></p> <p>5.1 A Note on Python’s Visualization Tools 62</p> <p>5.2 Example Code 62</p> <p>5.3 Pie Charts 63</p> <p>5.4 Bar Charts 65</p> <p>5.5 Histograms 66</p> <p>5.6 Means, Standard Deviations, Medians, and Quantiles 69</p> <p>5.7 Boxplots 70</p> <p>5.8 Scatterplots 72</p> <p>5.9 Scatterplots with Logarithmic Axes 74</p> <p>5.10 Scatter Matrices 76</p> <p>5.11 Heatmaps 77</p> <p>5.12 Correlations 78</p> <p>5.13 Anscombe’s Quartet and the Limits of Numbers 80</p> <p>5.14 Time Series 81</p> <p>5.15 Further Reading 85</p> <p>5.16 Glossary 85</p> <p><b>6 Machine Learning Overview 87</b></p> <p>6.1 Historical Context 88</p> <p>6.2 Supervised versus Unsupervised 89</p> <p>6.3 Training Data, Testing Data, and the Great Boogeyman of Overfitting 89</p> <p>6.4 Further Reading 91</p> <p>6.5 Glossary 91</p> <p><b>7 Interlude: Feature Extraction Ideas 93</b></p> <p>7.1 Standard Features 93</p> <p>7.2 Features That Involve Grouping 94</p> <p>7.3 Preview of More Sophisticated Features 95</p> <p>7.4 Defining the Feature You Want to Predict 95</p> <p><b>8 Machine Learning Classification 97</b></p> <p>8.1 What is a Classifier, and What Can You Do with It? 97</p> <p>8.2 A Few Practical Concerns 98</p> <p>8.3 Binary versus Multiclass 99</p> <p>8.4 Example Script 99</p> <p>8.5 Specific Classifiers 101</p> <p>8.5.1 Decision Trees 101</p> <p>8.5.2 Random Forests 103</p> <p>8.5.3 Ensemble Classifiers 104</p> <p>8.5.4 Support Vector Machines 105</p> <p>8.5.5 Logistic Regression 108</p> <p>8.5.6 Lasso Regression 110</p> <p>8.5.7 Naive Bayes 110</p> <p>8.5.8 Neural Nets 112</p> <p>8.6 Evaluating Classifiers 114</p> <p>8.6.1 Confusion Matrices 114</p> <p>8.6.2 ROC Curves 115</p> <p>8.6.3 Area under the ROC Curve 116</p> <p>8.7 Selecting Classification Cutoffs 117</p> <p>8.7.1 Other Performance Metrics 118</p> <p>8.7.2 Lift–Reach Curves 118</p> <p>8.8 Further Reading 119</p> <p>8.9 Glossary 119</p> <p><b>9 Technical Communication and Documentation 121</b></p> <p>9.1 Several Guiding Principles 122</p> <p>9.1.1 Know Your Audience 122</p> <p>9.1.2 Show Why It Matters 122</p> <p>9.1.3 Make It Concrete 123</p> <p>9.1.4 A Picture is Worth a Thousand Words 123</p> <p>9.1.5 Don’t Be Arrogant about Your Tech Knowledge 124</p> <p>9.1.6 Make It Look Decent 124</p> <p>9.2 Slide Decks 124</p> <p>9.2.1 C.R.A.P. Design 125</p> <p>9.2.2 A Few Tips and Rules of Thumb 127</p> <p>9.3 Written Reports 128</p> <p>9.4 Speaking: What Has Worked for Me 130</p> <p>9.5 Code Documentation 131</p> <p>9.6 Further Reading 132</p> <p>9.7 Glossary 132</p> <p><b>Part II Stuff You Still Need to Know 133</b></p> <p><b>10 Unsupervised Learning: Clustering and Dimensionality Reduction 135</b></p> <p>10.1 The Curse of Dimensionality 136</p> <p>10.2 Example: Eigenfaces for Dimensionality Reduction 138</p> <p>10.3 Principal Component Analysis and Factor Analysis 140</p> <p>10.4 Skree Plots and Understanding Dimensionality 142</p> <p>10.5 Factor Analysis 143</p> <p>10.6 Limitations of PCA 143</p> <p>10.7 Clustering 144</p> <p>10.7.1 Real‐World Assessment of Clusters 144</p> <p>10.7.2 <i>k</i>‐Means Clustering 145</p> <p>10.7.3 Gaussian Mixture Models 146</p> <p>10.7.4 Agglomerative Clustering 147</p> <p>10.7.5 Evaluating Cluster Quality 148</p> <p>10.7.6 SiIhouette Score 148</p> <p>10.7.7 Rand Index and Adjusted Rand Index 149</p> <p>10.7.8 Mutual Information 150</p> <p>10.8 Further Reading 151</p> <p>10.9 Glossary 151</p> <p><b>11 Regression 153</b></p> <p>11.1 Example: Predicting Diabetes Progression 153</p> <p>11.2 Least Squares 156</p> <p>11.3 Fitting Nonlinear Curves 157</p> <p>11.4 Goodness of Fit: <i>R</i>2 and Correlation 159</p> <p>11.5 Correlation of Residuals 160</p> <p>11.6 Linear Regression 161</p> <p>11.7 LASSO Regression and Feature Selection 162</p> <p>11.8 Further Reading 164</p> <p>11.9 Glossary 164</p> <p><b>12 Data Encodings and File Formats 165</b></p> <p>12.1 Typical File Format Categories 165</p> <p>12.1.1 Text Files 166</p> <p>12.1.2 Dense Numerical Arrays 166</p> <p>12.1.3 Program‐Specific Data Formats 166</p> <p>12.1.4 Compressed or Archived Data 166</p> <p>12.2 CSV Files 167</p> <p>12.3 JSON Files 168</p> <p>12.4 XML Files 170</p> <p>12.5 HTML Files 172</p> <p>12.6 Tar Files 174</p> <p>12.7 GZip Files 175</p> <p>12.8 Zip Files 175</p> <p>12.9 Image Files: Rasterized, Vectorized, and/or Compressed 176</p> <p>12.10 It’s All Bytes at the End of the Day 177</p> <p>12.11 Integers 178</p> <p>12.12 Floats 179</p> <p>12.13 Text Data 180</p> <p>12.14 Further Reading 183</p> <p>12.15 Glossary 183</p> <p><b>13 Big Data 185</b></p> <p>13.1 What is Big Data? 185</p> <p>13.2 Hadoop: The File System and the Processor 187</p> <p>13.3 Using HDFS 188</p> <p>13.4 Example PySpark Script 189</p> <p>13.5 Spark Overview 190</p> <p>13.6 Spark Operations 192</p> <p>13.7 Two Ways to Run PySpark 193</p> <p>13.8 Configuring Spark 194</p> <p>13.9 Under the Hood 195</p> <p>13.10 Spark Tips and Gotchas 196</p> <p>13.11 The MapReduce Paradigm 197</p> <p>13.12 Performance Considerations 199</p> <p>13.13 Further Reading 200</p> <p>13.14 Glossary 200</p> <p><b>14 Databases 203</b></p> <p>14.1 Relational Databases and MySQL<sup>®</sup> 204</p> <p>14.1.1 Basic Queries and Grouping 204</p> <p>14.1.2 Joins 207</p> <p>14.1.3 Nesting Queries 208</p> <p>14.1.4 Running MySQL and Managing the DB 209</p> <p>14.2 Key-Value Stores 210</p> <p>14.3 Wide Column Stores 211</p> <p>14.4 Document Stores 211</p> <p>14.4.1 MongoDB<sup>® </sup>212</p> <p>14.5 Further Reading 214</p> <p>14.6 Glossary 214</p> <p><b>15 Software Engineering Best Practices 217</b></p> <p>15.1 Coding Style 217</p> <p>15.2 Version Control and Git for Data Scientists 220</p> <p>15.3 Testing Code 222</p> <p>15.3.1 Unit Tests 223</p> <p>15.3.2 Integration Tests 224</p> <p>15.4 Test-Driven Development 225</p> <p>15.5 AGILE Methodology 225</p> <p>15.6 Further Reading 226</p> <p>15.7 Glossary 226</p> <p><b>16 Natural Language Processing 229</b></p> <p>16.1 Do I Even Need NLP? 229</p> <p>16.2 The Great Divide: Language versus Statistics 230</p> <p>16.3 Example: Sentiment Analysis on Stock Market Articles 230</p> <p>16.4 Software and Datasets 232</p> <p>16.5 Tokenization 233</p> <p>16.6 Central Concept: Bag‐of‐Words 233</p> <p>16.7 Word Weighting: TF‐IDF 235</p> <p>16.8 <i>n</i>‐Grams 235</p> <p>16.9 Stop Words 236</p> <p>16.10 Lemmatization and Stemming 236</p> <p>16.11 Synonyms 237</p> <p>16.12 Part of Speech Tagging 237</p> <p>16.13 Common Problems 238</p> <p>16.13.1 Search 238</p> <p>16.13.2 Sentiment Analysis 239</p> <p>16.13.3 Entity Recognition and Topic Modeling 240</p> <p>16.14 Advanced NLP: Syntax Trees, Knowledge, and Understanding 240</p> <p>16.15 Further Reading 241</p> <p>16.16 Glossary 242</p> <p><b>17 Time Series Analysis 243</b></p> <p>17.1 Example: Predicting Wikipedia Page Views 244</p> <p>17.2 A Typical Workflow 247</p> <p>17.3 Time Series versus Time-Stamped Events 248</p> <p>17.4 Resampling an Interpolation 249</p> <p>17.5 Smoothing Signals 251</p> <p>17.6 Logarithms and Other Transformations 252</p> <p>17.7 Trends and Periodicity 252</p> <p>17.8 Windowing 253</p> <p>17.9 Brainstorming Simple Features 254</p> <p>17.10 Better Features: Time Series as Vectors 255</p> <p>17.11 Fourier Analysis: Sometimes a Magic Bullet 256</p> <p>17.12 Time Series in Context: The Whole Suite of Features 259</p> <p>17.13 Further Reading 259</p> <p>17.14 Glossary 260</p> <p><b>18 Probability 261</b></p> <p>18.1 Flipping Coins: Bernoulli Random Variables 261</p> <p>18.2 Throwing Darts: Uniform Random Variables 263</p> <p>18.3 The Uniform Distribution and Pseudorandom Numbers 263</p> <p>18.4 Nondiscrete, Noncontinuous Random Variables 265</p> <p>18.5 Notation, Expectations, and Standard Deviation 267</p> <p>18.6 Dependence, Marginal and Conditional Probability 268</p> <p>18.7 Understanding the Tails 269</p> <p>18.8 Binomial Distribution 271</p> <p>18.9 Poisson Distribution 272</p> <p>18.10 Normal Distribution 272</p> <p>18.11 Multivariate Gaussian 273</p> <p>18.12 Exponential Distribution 274</p> <p>18.13 Log-Normal Distribution 276</p> <p>18.14 Entropy 277</p> <p>18.15 Further Reading 279</p> <p>18.16 Glossary 279</p> <p><b>19 Statistics 281</b></p> <p>19.1 Statistics in Perspective 281</p> <p>19.2 Bayesian versus Frequentist: Practical Tradeoffs and Differing Philosophies 282</p> <p>19.3 Hypothesis Testing: Key Idea and Example 283</p> <p>19.4 Multiple Hypothesis Testing 285</p> <p>19.5 Parameter Estimation 286</p> <p>19.6 Hypothesis Testing: t-Test 287</p> <p>19.7 Confidence Intervals 290</p> <p>19.8 Bayesian Statistics 291</p> <p>19.9 Naive Bayesian Statistics 293</p> <p>19.10 Bayesian Networks 293</p> <p>19.11 Choosing Priors: Maximum Entropy or Domain Knowledge 294</p> <p>19.12 Further Reading 295</p> <p>19.13 Glossary 295</p> <p><b>20 Programming Language Concepts 297</b></p> <p>20.1 Programming Paradigms 297</p> <p>20.1.1 Imperative 298</p> <p>20.1.2 Functional 298</p> <p>20.1.3 Object‐Oriented 301</p> <p>20.2 Compilation and Interpretation 305</p> <p>20.3 Type Systems 307</p> <p>20.3.1 Static versus Dynamic Typing 308</p> <p>20.3.2 Strong versus Weak Typing 308</p> <p>20.4 Further Reading 309</p> <p>20.5 Glossary 309</p> <p><b>21 Performance and Computer Memory 311</b></p> <p>21.1 Example Script 311</p> <p>21.2 Algorithm Performance and Big‐O Notation 314</p> <p>21.3 Some Classic Problems: Sorting a List and Binary Search 315</p> <p>21.4 Amortized Performance and Average Performance 318</p> <p>21.5 Two Principles: Reducing Overhead and Managing Memory 320</p> <p>21.6 Performance Tip: Use Numerical Libraries When Applicable 322</p> <p>21.7 Performance Tip: Delete Large Structures You Don’t Need 323</p> <p>21.8 Performance Tip: Use Built‐In Functions When Possible 324</p> <p>21.9 Performance Tip: Avoid Superfluous Function Calls 324</p> <p>21.10 Performance Tip: Avoid Creating Large New Objects 325</p> <p>21.11 Further Reading 325</p> <p>21.12 Glossary 325</p> <p><b>Part III Specialized or Advanced Topics 327</b></p> <p><b>22 Computer Memory and Data Structures 329</b></p> <p>22.1 Virtual Memory, the Stack, and the Heap 329</p> <p>22.2 Example C Program 330</p> <p>22.3 Data Types and Arrays in Memory 330</p> <p>22.4 Structs 332</p> <p>22.5 Pointers, the Stack, and the Heap 333</p> <p>22.6 Key Data Structures 337</p> <p>22.6.1 Strings 337</p> <p>22.6.2 Adjustable‐Size Arrays 338</p> <p>22.6.3 Hash Tables 339</p> <p>22.6.4 Linked Lists 340</p> <p>22.6.5 Binary Search Trees 342</p> <p>22.7 Further Reading 343</p> <p>22.8 Glossary 343</p> <p><b>23 Maximum Likelihood Estimation and Optimization 345</b></p> <p>23.1 Maximum Likelihood Estimation 345</p> <p>23.2 A Simple Example: Fitting a Line 346</p> <p>23.3 Another Example: Logistic Regression 348</p> <p>23.4 Optimization 348</p> <p>23.5 Gradient Descent and Convex Optimization 350</p> <p>23.6 Convex Optimization 353</p> <p>23.7 Stochastic Gradient Descent 355</p> <p>23.8 Further Reading 355</p> <p>23.9 Glossary 356</p> <p><b>24 Advanced Classifiers 357</b></p> <p>24.1 A Note on Libraries 358</p> <p>24.2 Basic Deep Learning 358</p> <p>24.3 Convolutional Neural Networks 361</p> <p>24.4 Different Types of Layers. What the Heck is a Tensor? 362</p> <p>24.5 Example: The MNIST Handwriting Dataset 363</p> <p>24.6 Recurrent Neural Networks 366</p> <p>24.7 Bayesian Networks 367</p> <p>24.8 Training and Prediction 369</p> <p>24.9 Markov Chain Monte Carlo 369</p> <p>24.10 PyMC Example 370</p> <p>24.11 Further Reading 373</p> <p>24.12 Glossary 373</p> <p><b>25 Stochastic Modeling 375</b></p> <p>25.1 Markov Chains 375</p> <p>25.2 Two Kinds of Markov Chain, Two Kinds of Questions 377</p> <p>25.3 Markov Chain Monte Carlo 379</p> <p>25.4 Hidden Markov Models and the Viterbi Algorithm 380</p> <p>25.5 The Viterbi Algorithm 382</p> <p>25.6 Random Walks 384</p> <p>25.7 Brownian Motion 384</p> <p>25.8 ARIMA Models 385</p> <p>25.9 Continuous‐Time Markov Processes 386</p> <p>25.10 Poisson Processes 387</p> <p>25.11 Further Reading 388</p> <p>25.12 Glossary 388</p> <p>25a Parting Words: Your Future as a Data Scientist 391</p> <p>Index 393</p>
<p><b>FIELD CADY</b> is the data scientist at the Allen Institute for Artificial Intelligence, where he develops tools that use machine learning to mine scientific literature.<br />He has also worked at Google and several Big Data startups. He has a BS in physics and math from Stanford University, and an MS in computer science from Carnegie Mellon.</p>
<p><b>A comprehensive overview of data science covering the analytics, programming, and business skills necessary to master the discipline</b></p> <p>Finding a good data scientist has been likened to hunting for a unicorn: the required combination of technical skills is simply very hard to find in one person. In addition, good data science is not just rote application of trainable skill sets; it requires the ability to think flexibly about all these areas and understand the connections between them. This book provides a crash course in data science, combining all the necessary skills into a unified discipline. <p>Unlike many analytics books, computer science and software engineering are given extensive coverage since they play such a central role in the daily work of a data scientist. The author also describes classic machine learning algorithms, from their mathematical foundations to real-world applications. Visualization tools are reviewed, and their central importance in data science is highlighted. Classical statistics is addressed to help readers think critically about the interpretation of data and its common pitfalls. The clear communication of technical results, which is perhaps the most undertrained of data science skills, is given its own chapter, and all topics are explained in the context of solving real-world data problems. The book also features: <ul><li>Extensive sample code and tutorials using Python™ along with its technical libraries</li><li>Core technologies of “Big Data,” including their strengths and limitations and how they can be used to solve real-world problems</li><li>Coverage of the practical realities of the tools, keeping theory to a minimum; however, when theory is presented, it is done in an intuitive way to encourage critical thinking and creativity </li><li>A wide variety of case studies from industry</li><li>Practical advice on the realities of being a data scientist today, including the overall workflow, where time is spent, the types of datasets worked on, and the skill sets needed</li></ul> <p><i>The Data Science Handbook</i> is an ideal resource for data analysis methodology and big data software tools. The book is appropriate for people who want to practice data science, but lack the required skill sets. This includes software professionals who need to better understand analytics and statisticians who need to understand software. Modern data science is a unified discipline, and it is presented as such. This book is also an appropriate reference for researchers and entry-level graduate students who need to learn real-world analytics and expand their skill set.

Diese Produkte könnten Sie auch interessieren:

Statistics for Microarrays
Statistics for Microarrays
von: Ernst Wit, John McClure
PDF ebook
90,99 €