Details

Machine Learning


Machine Learning

a Concise Introduction
Wiley Series in Probability and Statistics, Band 285 1. Aufl.

von: Steven W. Knox

76,99 €

Verlag: Wiley
Format: EPUB
Veröffentl.: 15.03.2018
ISBN/EAN: 9781119438984
Sprache: englisch
Anzahl Seiten: 352

DRM-geschütztes eBook, Sie benötigen z.B. Adobe Digital Editions und eine Adobe ID zum Lesen.

Beschreibungen

<p><b>AN INTRODUCTION TO MACHINE LEARNING THAT INCLUDES THE FUNDAMENTAL TECHNIQUES, METHODS, AND APPLICATIONS<br /><br />PROSE Award Finalist 2019 <br />Association of American Publishers Award for Professional and Scholarly Excellence</b></p> <p><i>Machine Learning: a Concise Introduction </i>offers a comprehensive introduction to the core concepts, approaches, and applications of machine learning. The author—an expert in the field—presents fundamental ideas, terminology, and techniques for solving applied problems in classification, regression, clustering, density estimation, and dimension reduction. The design principles behind the techniques are emphasized, including the bias-variance trade-off and its influence on the design of ensemble methods. Understanding these principles leads to more flexible and successful applications. <i>Machine Learning: a Concise Introduction </i>also includes methods for optimization, risk estimation, and model selection— essential elements of most applied projects. This important resource:</p> <ul> <li>Illustrates many classification methods with a single, running example, highlighting similarities and differences between methods</li> <li>Presents R source code which shows how to apply and interpret many of the techniques covered</li> <li>Includes many thoughtful exercises as an integral part of the text, with an appendix of selected solutions</li> <li>Contains useful information for effectively communicating with clients</li> </ul> <p>A volume in the popular Wiley Series in Probability and Statistics, <i>Machine Learning</i>: <i>a Concise Introduction </i>offers the practical information needed for an understanding of the methods and application of machine learning.</p> <p><b>STEVEN W. KNOX </b>holds a Ph.D. in Mathematics from the University of Illinois and an M.S. in Statistics from Carnegie Mellon University. He has over twenty years’ experience in using Machine Learning, Statistics, and Mathematics to solve real-world problems. He currently serves as Technical Director of Mathematics Research and Senior Advocate for Data Science at the National Security Agency.</p>
<p>Preface xi</p> <p>Organization—How to Use This Book xiii</p> <p>Acknowledgments xvii</p> <p>About the Companion Website xix</p> <p><b>1 Introduction—Examples from Real Life 1</b></p> <p><b>2 The Problem of Learning 3</b></p> <p>2.1 Domain 4</p> <p>2.2 Range 4</p> <p>2.3 Data 4</p> <p>2.4 Loss 6</p> <p>2.5 Risk 8</p> <p>2.6 The Reality of the Unknown Function 12</p> <p>2.7 Training and Selection of Models, and Purposes of Learning 12</p> <p>2.8 Notation 13</p> <p><b>3 Regression 15</b></p> <p>3.1 General Framework 16</p> <p>3.2 Loss 17</p> <p>3.3 Estimating the Model Parameters 17</p> <p>3.4 Properties of Fitted Values 19</p> <p>3.5 Estimating the Variance 22</p> <p>3.6 A Normality Assumption 23</p> <p>3.7 Computation 24</p> <p>3.8 Categorical Features 25</p> <p>3.9 Feature Transformations, Expansions, and Interactions 27</p> <p>3.10 Variations in Linear Regression 28</p> <p>3.11 Nonparametric Regression 32</p> <p><b>4 Survey of Classification Techniques 33</b></p> <p>4.1 The Bayes Classifier 34</p> <p>4.2 Introduction to Classifiers 37</p> <p>4.3 A Running Example 38</p> <p>4.4 Likelihood Methods 40</p> <p>4.4.1 Quadratic Discriminant Analysis 41</p> <p>4.4.2 Linear Discriminant Analysis 43</p> <p>4.4.3 Gaussian Mixture Models 45</p> <p>4.4.4 Kernel Density Estimation 47</p> <p>4.4.5 Histograms 51</p> <p>4.4.6 The Naive Bayes Classifier 54</p> <p>4.5 Prototype Methods 54</p> <p>4.5.1 <i>k</i>-Nearest-Neighbor 55</p> <p>4.5.2 Condensed <i>k</i>-Nearest-Neighbor 56</p> <p>4.5.3 Nearest-Cluster 56</p> <p>4.5.4 Learning Vector Quantization 58</p> <p>4.6 Logistic Regression 59</p> <p>4.7 Neural Networks 62</p> <p>4.7.1 Activation Functions 62</p> <p>4.7.2 Neurons 64</p> <p>4.7.3 Neural Networks 65</p> <p>4.7.4 Logistic Regression and Neural Networks 73</p> <p>4.8 Classification Trees 74</p> <p>4.8.1 Classification of Data by Leaves (Terminal Nodes) 74</p> <p>4.8.2 Impurity of Nodes and Trees 75</p> <p>4.8.3 Growing Trees 76</p> <p>4.8.4 Pruning Trees 79</p> <p>4.8.5 Regression Trees 81</p> <p>4.9 Support Vector Machines 81</p> <p>4.9.1 Support Vector Machine Classifiers 81</p> <p>4.9.2 Kernelization 88</p> <p>4.9.3 Proximal Support Vector Machine Classifiers 92</p> <p>4.10 Postscript: Example Problem Revisited 93</p> <p><b>5 Bias–Variance Trade-off 97</b></p> <p>5.1 Squared-Error Loss 98</p> <p>5.2 Arbitrary Loss 101</p> <p><b>6 Combining Classifiers 107</b></p> <p>6.1 Ensembles 107</p> <p>6.2 Ensemble Design 110</p> <p>6.3 Bootstrap Aggregation (Bagging) 112</p> <p>6.4 Bumping 115</p> <p>6.5 Random Forests 116</p> <p>6.6 Boosting 118</p> <p>6.7 Arcing 121</p> <p>6.8 Stacking and Mixture of Experts 121</p> <p><b>7 Risk Estimation and Model Selection 127</b></p> <p>7.1 Risk Estimation via Training Data 128</p> <p>7.2 Risk Estimation via Validation or Test Data 128</p> <p>7.2.1 Training, Validation, and Test Data 128</p> <p>7.2.2 Risk Estimation 129</p> <p>7.2.3 Size of Training, Validation, and Test Sets 130</p> <p>7.2.4 Testing Hypotheses About Risk 131</p> <p>7.2.5 Example of Use of Training, Validation, and Test Sets 132</p> <p>7.3 Cross-Validation 133</p> <p>7.4 Improvements on Cross-Validation 135</p> <p>7.5 Out-of-Bag Risk Estimation 137</p> <p>7.6 Akaike’s Information Criterion 138</p> <p>7.7 Schwartz’s Bayesian Information Criterion 138</p> <p>7.8 Rissanen’s Minimum Description Length Criterion 139</p> <p>7.9 <i>R<sup>2</sup> </i>and Adjusted <i>R<sup>2</sup> </i>140</p> <p>7.10 Stepwise Model Selection 141</p> <p>7.11 Occam’s Razor 142</p> <p><b>8 Consistency 143</b></p> <p>8.1 Convergence of Sequences of Random Variables 144</p> <p>8.2 Consistency for Parameter Estimation 144</p> <p>8.3 Consistency for Prediction 145</p> <p>8.4 There Are Consistent and Universally Consistent Classifiers 145</p> <p>8.5 Convergence to Asymptopia Is Not Uniform and May Be Slow 147</p> <p><b>9 Clustering 149</b></p> <p>9.1 Gaussian Mixture Models 150</p> <p>9.2 <i>k</i>-Means 150</p> <p>9.3 Clustering by Mode-Hunting in a Density Estimate 151</p> <p>9.4 Using Classifiers to Cluster 152</p> <p>9.5 Dissimilarity 153</p> <p>9.6 <i>k</i>-Medoids 153</p> <p>9.7 Agglomerative Hierarchical Clustering 154</p> <p>9.8 Divisive Hierarchical Clustering 155</p> <p>9.9 How Many Clusters Are There? Interpretation of Clustering 155</p> <p>9.10 An Impossibility Theorem 157</p> <p><b>10 Optimization 159</b></p> <p>10.1 Quasi-Newton Methods 160</p> <p>10.1.1 Newton’s Method for Finding Zeros 160</p> <p>10.1.2 Newton’s Method for Optimization 161</p> <p>10.1.3 Gradient Descent 161</p> <p>10.1.4 The BFGS Algorithm 162</p> <p>10.1.5 Modifications to Quasi-Newton Methods 162</p> <p>10.1.6 Gradients for Logistic Regression and Neural Networks 163</p> <p>10.2 The Nelder–Mead Algorithm 166</p> <p>10.3 Simulated Annealing 168</p> <p>10.4 Genetic Algorithms 168</p> <p>10.5 Particle Swarm Optimization 169</p> <p>10.6 General Remarks on Optimization 170</p> <p>10.6.1 Imperfectly Known Objective Functions 170</p> <p>10.6.2 Objective Functions Which Are Sums 171</p> <p>10.6.3 Optimization from Multiple Starting Points 172</p> <p>10.7 The Expectation-Maximization Algorithm 173</p> <p>10.7.1 The General Algorithm 173</p> <p>10.7.2 EM Climbs the Marginal Likelihood of the Observations 173</p> <p>10.7.3 Example—Fitting a Gaussian Mixture Model Via EM 176</p> <p>10.7.4 Example—The Expectation Step 177</p> <p>10.7.5 Example—The Maximization Step 178</p> <p><b>11 High-Dimensional Data 179</b></p> <p>11.1 The Curse of Dimensionality 180</p> <p>11.2 Two Running Examples 187</p> <p>11.2.1 Example 1: Equilateral Simplex 187</p> <p>11.2.2 Example 2: Text 187</p> <p>11.3 Reducing Dimension While Preserving Information 190</p> <p>11.3.1 The Geometry of Means and Covariances of Real Features 190</p> <p>11.3.2 Principal Component Analysis 192</p> <p>11.3.3 Working in “Dissimilarity Space” 193</p> <p>11.3.4 Linear Multidimensional Scaling 195</p> <p>11.3.5 The Singular Value Decomposition and Low-Rank Approximation 197</p> <p>11.3.6 Stress-Minimizing Multidimensional Scaling 199</p> <p>11.3.7 Projection Pursuit 199</p> <p>11.3.8 Feature Selection 201</p> <p>11.3.9 Clustering 202</p> <p>11.3.10 Manifold Learning 202</p> <p>11.3.11 Autoencoders 205</p> <p>11.4 Model Regularization 209</p> <p>11.4.1 Duality and the Geometry of Parameter Penalization 212</p> <p>11.4.2 Parameter Penalization as Prior Information 213</p> <p><b>12 Communication with Clients 217</b></p> <p>12.1 Binary Classification and Hypothesis Testing 218</p> <p>12.2 Terminology for Binary Decisions 219</p> <p>12.3 ROC Curves 219</p> <p>12.4 One-Dimensional Measures of Performance 224</p> <p>12.5 Confusion Matrices 225</p> <p>12.6 Multiple Testing 226</p> <p>12.6.1 Control the Familywise Error 226</p> <p>12.6.2 Control the False Discovery Rate 227</p> <p>12.7 Expert Systems 228</p> <p><b>13 Current Challenges in Machine Learning 231</b></p> <p>13.1 Streaming Data 231</p> <p>13.2 Distributed Data 231</p> <p>13.3 Semi-supervised Learning 232</p> <p>13.4 Active Learning 232</p> <p>13.5 Feature Construction via Deep Neural Networks 233</p> <p>13.6 Transfer Learning 233</p> <p>13.7 Interpretability of Complex Models 233</p> <p><b>14 R Source Code 235</b></p> <p>14.1 Author’s Biases 236</p> <p>14.2 Libraries 236</p> <p>14.3 The Running Example (Section 4.3) 237</p> <p>14.4 The Bayes Classifier (Section 4.1) 241</p> <p>14.5 Quadratic Discriminant Analysis (Section 4.4.1) 243</p> <p>14.6 Linear Discriminant Analysis (Section 4.4.2) 243</p> <p>14.7 Gaussian Mixture Models (Section 4.4.3) 244</p> <p>14.8 Kernel Density Estimation (Section 4.4.4) 245</p> <p>14.9 Histograms (Section 4.4.5) 248</p> <p>14.10 The Naive Bayes Classifier (Section 4.4.6) 253</p> <p>14.11 <i>k</i>-Nearest-Neighbor (Section 4.5.1) 255</p> <p>14.12 Learning Vector Quantization (Section 4.5.4) 257</p> <p>14.13 Logistic Regression (Section 4.6) 259</p> <p>14.14 Neural Networks (Section 4.7) 260</p> <p>14.15 Classification Trees (Section 4.8) 263</p> <p>14.16 Support Vector Machines (Section 4.9) 267</p> <p>14.17 Bootstrap Aggregation (Section 6.3) 272</p> <p>14.18 Boosting (Section 6.6) 274</p> <p>14.19 Arcing (Section 6.7) 275</p> <p>14.20 Random Forests (Section 6.5) 275</p> <p>A List of Symbols 277</p> <p>B Solutions to Selected Exercises 279</p> <p>C Converting Between Normal Parameters and Level-Curve Ellipsoids 299</p> <p>D Training Data and Fitted Parameters 301</p> <p>References 305</p> <p>Index 315</p>
<p><b>STEVEN W. KNOX</b> holds a Ph.D. in Mathematics from the University of Illinois and an M.S. in Statistics from Carnegie Mellon University. He has over twenty years' experience in using Machine Learning, Statistics, and Mathematics to solve real-world problems. He currently serves as Technical Director of Mathematics Research and Senior Advocate for Data Science at the National Security Agency.
<p><b>AN INTRODUCTION TO MACHINE LEARNING THAT INCLUDES THE FUNDAMENTAL TECHNIQUES, METHODS, AND APPLICATIONS</b> <p><i>Machine Learning: a Concise Introduction</i> offers a comprehensive introduction to the core concepts, approaches, and applications of machine learning. The author—an expert in the field—presents fundamental ideas, terminology, and techniques for solving applied problems in classification, regression, clustering, density estimation, and dimension reduction. The design principles behind the techniques are emphasized, including the bias-variance trade-off and its influence on the design of ensemble methods. Understanding these principles leads to more flexible and successful applications. <i>Machine Learning: a Concise Introduction</i> also includes methods for optimization, risk estimation, and model selection—essential elements of most applied projects. This important resource: <ul> <li>Illustrates many classification methods with a single, running example, highlighting similarities and differences between methods</li> <li>Presents R source code which shows how to apply and interpret many of the techniques covered</li> <li>Includes many thoughtful exercises as an integral part of the text, with an appendix of selected solutions</li> <li>Contains useful information for effectively communicating with clients</li> </ul> <p>A volume in the popular Wiley Series in Probability and Statistics, <i>Machine Learning</i>: <i>a Concise Introduction</i> offers the practical information needed for an understanding of the methods and application of machine learning.

Diese Produkte könnten Sie auch interessieren:

Statistics for Microarrays
Statistics for Microarrays
von: Ernst Wit, John McClure
PDF ebook
90,99 €