Details

Statistical Analysis Techniques in Particle Physics


Statistical Analysis Techniques in Particle Physics

Fits, Density Estimation and Supervised Learning
1. Aufl.

von: Ilya Narsky, Frank C. Porter

99,99 €

Verlag: Wiley-VCH
Format: PDF
Veröffentl.: 17.10.2013
ISBN/EAN: 9783527677313
Sprache: englisch
Anzahl Seiten: 459

DRM-geschütztes eBook, Sie benötigen z.B. Adobe Digital Editions und eine Adobe ID zum Lesen.

Beschreibungen

Modern analysis of HEP data needs advanced statistical tools to separate signal from background. This is the first book which focuses on machine learning techniques. It will be of interest to almost every high energy physicist, and, due to its coverage, suitable for students.
1 Why We Wrote This Book and How You Should Read It<br> 2 Parametric Likelihood Fits<br> 2.1 Preliminaries<br> 2.2 Parametric Likelihood Fits<br> 2.3 Fits for Small Statistics<br> 2.4 Results Near the Boundary of a Physical Region<br> 2.5 Likelihood Ratio Test for Presence of Signal<br> 2.6 sPlots<br> 2.7 Exercises<br> 3 Goodness of Fit<br> 3.1 Binned Goodness of Fit Tests<br> 3.2 Statistics Converging to Chi-Square<br> 3.3 Univariate Unbinned Goodness of Fit Tests<br> 3.4 Multivariate Tests<br> 3.5 Exercises<br> 4 Resampling Techniques<br> 4.1 Permutation Sampling<br> 4.2 Bootstrap<br> 4.3 Jackknife<br> 4.4 BCa Confidence Intervals<br> 4.5 Cross-Validation<br> 4.6 _Resampling Weighted Observations<br> 4.7 Exercises<br> 5 Density Estimation<br> 5.1 Empirical Density Estimate<br> 5.2 Histograms<br> 5.3 Kernel Estimation<br> 5.4 Ideogram<br> 5.5 Parametric vs. Nonparametric Density Estimation<br> 5.6 Optimization<br> 5.7 Estimating Errors<br> 5.8 The Curse of Dimensionality<br> 5.9 Adaptive Kernel Estimation<br> 5.10 Naive Bayes Classification<br> 5.11 Multivariate Kernel Estimation<br> 5.12 Estimation Using Orthogonal Series<br> 5.13 Using Monte Carlo Models<br> 5.14 Unfolding<br> 5.14.1 Unfolding: Regularization<br> 6 Basic Concepts and Definitions of Machine Learning<br> 6.1 Supervised, Unsupervised, and Semi-Supervised<br> 6.2 Tall and Wide Data<br> 6.3 Batch and Online Learning<br> 6.4 Parallel Learning<br> 6.5 Classification and Regression<br> 7 Data Preprocessing<br> 7.1 Categorical Variables<br> 7.2 Missing Values<br> 7.3 Outliers<br> 7.4 Exercises<br> 8 Linear Transformations and Dimensionality Reduction<br> 8.1 Centering, Scaling, Reflection and Rotation<br> 8.2 Rotation and Dimensionality Reduction<br> 8.3 Principal Component Analysis (PCA)<br> of Components<br> 8.4 Independent Component Analysis (ICA)<br> 8.4.1 Theory<br> 8.5 Exercises<br> 9 Introduction to Classification<br> 9.1 Loss Functions: Hard Labels and Soft Scores<br> 9.2 Bias, Variance, and Noise<br> 9.3 Training, Validating and Testing: The Optimal Splitting Rule<br> 9.4 Resampling Techniques: Cross-Validation and Bootstrap<br> 9.5 Data with Unbalanced Classes<br> 9.6 Learning with Cost<br> 9.7 Exercises<br> 10 Assessing Classifier Performance<br> 10.1 Classification Error and Other Measures of Predictive Power<br> 10.2 Receiver Operating Characteristic (ROC) and Other Curves<br> 10.3 Testing Equivalence of Two Classification Models<br> 10.4 Comparing Several Classifiers<br> 10.5 Exercises<br> 11 Linear and Quadratic Discriminant Analysis, Logistic Regression,<br> and Partial Least Squares Regression<br> 11.1 Discriminant Analysis<br> 11.2 Logistic Regression<br> 11.3 Classification by Linear Regression<br> 11.4 Partial Least Squares Regression<br> 11.5 Example: Linear Models for MAGIC Telescope Data<br> 11.6 Choosing a Linear Classifier for Your Analysis<br> 11.7 Exercises<br> 12 Neural Networks<br> 12.1 Perceptrons<br> 12.2 The Feed-Forward Neural Network<br> 12.3 Backpropagation<br> 12.4 Bayes Neural Networks<br> 12.5 Genetic Algorithms<br> 12.6 Exercises<br> 13 Local Learning and Kernel Expansion<br> 13.1 From Input Variables to the Feature Space<br> 13.2 Regularization<br> 13.3 Making and Choosing Kernels<br> 13.4 Radial Basis Functions<br> 13.5 Support Vector Machines (SVM)<br> 13.6 Empirical Local Methods<br> 13.7 Kernel Methods: The Good, the Bad and the Curse of Dimensionality<br> 13.8 Exercises<br> 14 Decision Trees<br> 14.1 Growing Trees<br> 14.2 Predicting by Decision Trees<br> 14.3 Stopping Rules<br> 14.4 Pruning Trees<br> 14.5 Trees for Multiple Classes<br> 14.6 Splits on Categorical Variables<br> 14.7 Surrogate Splits<br> 14.8 Missing Values<br> 14.9 Variable importance<br> 14.10 Why Are Decision Trees Good (or Bad)?<br> 14.11 Exercises<br> 15 Ensemble Learning<br> 15.1 Boosting<br> 15.2 Diversifying theWeak Learner: Bagging, Random Subspace and Random Forest<br> 15.3 Choosing an Ensemble for Your Analysis<br> 15.4 Exercises<br> 16 Reducing Multiclass to Binary<br> 16.1 Encoding<br> 16.2 Decoding<br> 16.3 Summary: Choosing the Right Design<br> 17 How to Choose the Right Classifier for Your Analysis and Apply It Correctly<br> 17.1 Predictive Performance and Interpretability<br> 17.2 Matching Classifiers and Variables<br> 17.3 Using Classifier Predictions<br> 17.4 Optimizing Accuracy<br> 17.5 CPU and Memory Requirements<br> 18 Methods for Variable Ranking and Selection<br> 18.1 Definitions<br> 18.2 Variable Ranking<br> Elimination (SBE), and Feature-based Sensitivity of Posterior Probabilities (FSPP)<br> 18.3 Variable Selection (BECM)<br> 18.4 Exercises<br> 19 Bump Hunting in Multivariate Data<br> 19.1 Voronoi Tessellation and SLEUTH Algorithm<br> 19.2 Identifying Box Regions by PRIM and Other Algorithms<br> 19.3 Bump Hunting Through Supervised Learning<br> 20 Software Packages for Machine Learning<br> 20.1 Tools Developed in HEP<br> 20.2 R<br> 20.3 MATLAB<br> 20.4 Tools for Java and Python<br> 20.5 What Software Tool Is Right for You?<br> Appendix A: Optimization Algorithms<br> A.1 Line Search<br> A.2 Linear Programming (LP)
<p>The authors are experts in the use of statistics in particle physics data analysis. Frank C. Porter is Professor at Physics at the California Institute of Technology and has lectured extensively at CalTech, the SLAC Laboratory at Stanford, and elsewhere. Ilya Narsky is Senior Matlab Developer at The MathWorks, a leading developer of technical computing software for engineers and scientists, and the initiator of the StatPatternRecognition, a C++ package for statistical analysis of HEP data. Together, they have taught courses for graduate students and postdocs.</p>
<p>Based on lectures given by the authors at Stanford and Caltech, this practical approach shows by means of analysis examples how observables are extracted from data, how signal and background are estimated, and how accurate error estimates are obtained exploiting uni- and multivariate analysis techniques. The book includes simple code snippets that run on the popular software suite MATLAB. These snippets make use of publicly available datasets that can be downloaded from the Web.<br /><br />Primarily aimed at PhD and very advanced undergraduate students, this text can be also used by researchers.</p> <p>From the contents:</p> <ul> <li>Parametric likelihood fits</li> <li>Goodness of fit</li> <li>Resampling techniques</li> <li>Density estimation</li> <li>Data pre-processing</li> <li>Linear transformations and dimensionality reduction</li> <li>Introduction to classifi cation</li> <li>Assessing classifi er performance</li> <li>Linear classification</li> <li>Neural networks</li> <li>Local learning and kernel expansion</li> <li>Decision trees</li> <li>Ensemble learning</li> <li>Reducing multiclass to binary</li> <li>Methods for variable ranking and selection</li> </ul>

Diese Produkte könnten Sie auch interessieren:

Systemtheoretische Grundlagen optoelektronischer Sensoren
Systemtheoretische Grundlagen optoelektronischer Sensoren
von: Herbert Jahn, Ralf Reulke
PDF ebook
79,99 €
Superconductivity
Superconductivity
von: Kristian Fossheim, Asle Sudboe
PDF ebook
136,99 €
Solid-State Physics for Electronics
Solid-State Physics for Electronics
von: Andre Moliton
PDF ebook
215,99 €