Details

Data Smart

Using Data Science to Transform Information into Insight
2. Aufl.

von: Jordan Goldmeier
32,99 €
Verlag:	Wiley
Format:	EPUB
Veröffentl.:	22.09.2023
ISBN/EAN:	9781119931393
Sprache:	englisch
Anzahl Seiten:	448

In den Warenkorb

Als Gutschein

DRM-geschütztes eBook, Sie benötigen z.B. Adobe Digital Editions und eine Adobe ID zum Lesen.

Beschreibungen

Titelbeschreibung

Want to jump into data science but don't know where to start? Let's be real, data science is presented as something mystical and unattainable without the most powerful software, hardware, and data expertise. Real data science isn't about technology. It's about how you approach the problem. In this updated edition of Data Smart: Using Data Science to Transform Information into Insight, award-winning data scientist and bestselling author Jordan Goldmeier shows you how to implement data science problems using Excel while exposing how things work behind the scenes. Data Smart is your field guide to building statistics, machine learning, and powerful artificial intelligence concepts right inside your spreadsheet. Inside you'll find: <ul> <li>Four-color data visualizations that highlight and illustrate the concepts discussed in the book</li> <li>Tutorials explaining complicated data science using just Microsoft Excel</li> <li>How to take what you've learned and apply it to everyday problems at work and life</li> <li>Advice for using formulas, Power Query, and some of Excel's latest features to solve tough data problems</li> <li>Smart data science solutions for common business challenges</li> <li>Explanations of what algorithms do, how they work, and what you can tweak to take your Excel skills to the next level</li> </ul> Data Smart is a must-read for students, analysts, and managers ready to become data science savvy and share their findings with the world.

Inhaltsverzeichnis

Introduction xix 1 Everything You Ever Needed to Know About Spreadsheets but Were Too Afraid to Ask 1 Some Sample Data 2 Accessing Quick Descriptive Statistics 3 Excel Tables 4 Filtering and Sorting 5 Table Formatting 7 Structured References 7 Adding Table Columns 10 Lookup Formulas 11 VLOOKUP 11 INDEX/MATCH 13 XLOOKUP 15 PivotTables 16 Using Array Formulas 19 Solving Stuff with Solver 20 2 Set It and Forget It: An Introduction to Power Query 27 What Is Power Query? 27 Sample Data 28 Starting Power Query 29 Filtering Rows 32 Removing Columns 33 Find & Replace 34 Close & Load to Table 35 3 Naïve Bayes and the Incredible Lightness of Being an Idiot 39 The World's Fastest Intro to Probability Theory 39 Totaling Conditional Probabilities 40 Joint Probability, the Chain Rule, and Independence 40 What Happens in a Dependent Situation? 41 Bayes Rule 42 Separating the Signal and the Noise 43 Using the Bayes Rule to Create an AI Model 44 High-Level Class Probabilities Are Often Assumed to Be Equal 45 A Couple More Odds and Ends 46 Let's Get This Excel Party Started 47 Cleaning the Data with Power Query 48 Splitting on Spaces: Giving Each Word Its Due 50 Counting Tokens and Calculating Probabilities 55 We Have a Model! Let's Use It 58 4 Cluster Analysis Part 1: Using K-Means to Segment Your Customer Base 65 Dances at Summer Camp 65 Getting Real: K-Means Clustering Subscribers in Email Marketing 70 The Initial Dataset 71 Determining What to Measure 72 Start with Four Clusters 75 Euclidean Distance: Measuring Distances as the Crow Flies 76 Solving for the Cluster Centers 80 Making Sense of the Results 82 Getting the Top Deals by Cluster 83 The Silhouette: A Good Way to Let Different K Values Duke It Out 86 How About Five Clusters? 95 Solving for Five Clusters 96 Getting the Top Deals for All Five Clusters 96 Computing the Silhouette for 5-Means Clustering 99 K-Medians Clustering and Asymmetric Distance Measurements 100 Using K-Medians Clustering 100 Getting a More Appropriate Distance Metric 100 Putting It All in Excel 102 The Top Deals for the 5-Medians Clusters 104 5 Cluster Analysis Part II: Network Graphs and Community Detection 109 What Is a Network Graph? 110 Visualizing a Simple Graph 110 Beyond GiGraph and Adjacency Lists 115 Building a Graph from the Wholesale Wine Data 117 Creating a Cosine Similarity Matrix 118 Producing an R-Neighborhood Graph 121 Introduction to Gephi 123 Creating a Static Adjacency Matrix 124 Bringing in Your R-Neighborhood Adjacency Matrix into Gephi 124 Node Degree 128 Touching the Graph Data 130 How Much Is an Edge Worth? Points and Penalties in Graph Modularity 132 What's a Point, and What's a Penalty? 133 Setting Up the Score Sheet 136 Let's Get Clustering! 138 Split Number 1 138 Split 2: Electric Boogaloo 143 And. . .Split3: Split with a Vengeance 145 Encoding and Analyzing the Communities 146 There and Back Again: A Gephi Tale 151 6 Regression: The Granddaddy of Supervised Artificial Intelligence 157 Predicting Pregnant Customers at RetailMart Using Linear Regression 158 The Feature Set 159 Assembling the Training Data 161 Creating Dummy Variables 163 Let's Bake Our Own Linear Regression 165 Linear Regression Statistics: R-Squared, F-Tests, t-Tests 173 Making Predictions on Some New Data and Measuring Performance 182 Predicting Pregnant Customers at RetailMart Using Logistic Regression 192 First You Need a Link Function 192 Hooking Up the Logistic Function and Reoptimizing 193 Baking an Actual Logistic Regression 196 7 Ensemble Models: A Whole Lot of Bad Pizza 203 Getting Started Using the Data from Chapter 6 203 Bagging: Randomize, Train, Repeat 204 Decision Stump is Another Name for a Weak Learner 204 Doesn't Seem So Weak to Me! 204 You Need More Power! 207 Let's Train It 208 Evaluating the Bagged Model 220 Boosting: If You Get It Wrong, Just Boost and Try Again 223 Training the Model—Every Feature Gets a Shot 224 Evaluating the Boosted Model 231 8 Forecasting: Breathe Easy: You Can't Win 235 The Sword Trade Is Hopping 236 Getting Acquainted with Time-Series Data 236 Starting Slow with Simple Exponential Smoothing 238 Setting Up the Simple Exponential Smoothing Forecast 240 You Might Have a Trend 249 Holt's Trend-Corrected Exponential Smoothing 250 Setting Up Holt's Trend-Corrected Smoothing in a Spreadsheet 252 So Are You Done? Looking at Autocorrelations 258 Multiplicative Holt-Winters Exponential Smoothing 266 Setting the Initial Values for Level, Trend, and Seasonality 268 Getting Rolling on the Forecast 274 And. . .Optimize! 280 Putting a Prediction Interval Around the Forecast 283 Creating a Fan Chart for Effect 287 Forecast Sheets in Excel 289 9 Optimization Modeling: Because That "Fresh-Squeezed" Orange Juice Ain't Gonna Blend Itself 293 Wait Is This Data Science? 294 Starting with a Simple Trade-Off 295 Representing the Problem as a Polytope 296 Solving by Sliding the Level Set 297 The Simplex Method: Rooting Around the Corners 298 Working in Excel 300 Fresh from the Grove to Your Glass with a Pit Stop Through a Blending Model 305 Let's Start with Some Specs 307 Coming Back to Consistency 308 Putting the Data into Excel 309 Setting Up the Problem in Solver 311 Lowering Your Standards 314 Dead Squirrel Removal: the Minimax Formulation 317 If-Then and the "Big M" Constraint 320 Multiplying Variables: Cranking Up the Volume to 11,000 324 Modeling Risk 330 Normally Distributed Data 331 10 Outlier Detection: Just Because They're Odd Doesn't Mean They're Unimportant 339 Outliers Are (Bad?) People, Too 340 The Fascinating Case of Hadlum v Hadlum 340 Tukey's Fences 341 Applying Tukey's Fences in a Spreadsheet 342 The Limitations of This Simple Approach 345 Terrible at Nothing, Bad at Everything 346 Preparing Data for Graphing 347 Creating a Graph 350 Getting the k-Nearest Neighbors 351 Graph Outlier Detection Method 1: Just Use the Indegree 352 Graph Outlier Detection Method 2: Getting Nuanced with k-Distance 355 Graph Outlier Detection Method 3: Local Outlier Factors Are Where It's At 358 11 Moving on From Spreadsheets 363 Getting Up and Running with R 364 A Crash Course in R-ing 366 Show Me the Numbers! Vector Math and Factoring 367 The Best Data Type of Them All: the Dataframe 370 How to Ask for Help in R 371 It Gets Even Better Beyond Base R 372 Doing Some Actual Data Science 374 Reading Data into R 374 Spherical K-Means on Wine Data in Just a Few Lines 375 Building AI Models on the Pregnancy Data 381 Forecasting in R 389 Looking at Outlier Detection 393 12 Conclusion 397 Where Am I? What Just Happened? 397 Before You Go-Go 397 Get to Know the Problem 398 We Need More Translators 398 Beware the Three-Headed Geek-Monster: Tools, Performance, and Mathematical Perfection 399 You Are Not the Most Important Function of Your Organization 401 Get Creative and Keep in Touch! 402 Index 403

Autorenportrait

JORDAN GOLDMEIER is an award-winning author in analytics, data science, and data visualization, and 11-time Microsoft MVP winner. Jordan has served analytics solutions for global organizations like NATO, The World Bank and Habitat for Humanity, and Fortune 500 companies likes Principal Financial and H&M. He has taught as an instructor for Wake Forest University, and served as a volunteer Emergency Medical Technician in New York City.

Back cover copy

Jordan breaks down advanced concepts with an infectious simplicity that has become his signature style. "Through the data-driven AI hype, Data Smart injects clarity into the discussion with fresh, compelling examples. This is an indispensable guide to becoming truly data smart. The second edition continues the mission of the first—to be a beacon of clarity against the clamor of data science hysteria. Ready to usher in a new generation of analysts, this is your ticket to becoming truly data smart." —Alex Gutman, PhD, Director of Data Science, Author of Becoming a Data Head An absolute gem. "Data Smart solidifies Excel's enduring relevance in the age of AI. Analysts who embrace its full potential will continue to thrive." —George Mount, Excel MVP, Founder at Stringfest Analytics, Author of Advancing into Analytics Most people are approaching data all wrong. Here's how to do it right. In the era of AI, data scientists appear as mystical practitioners of magical arts. But I'm here to tell you: Data science is something you can do. Really. This book shows you the significant data science techniques, how they work, how to use them, and how they benefit your business, large or small. It's about turning raw data into insight you can act upon and doing it as quickly and painlessly as possible. Roll up your sleeves and let's get going. Relax — it's just Excel In this newly revised edition featuring four-color visualizations, you'll learn about <ul> <li>Cutting-edge Excel tools Power Query and new functions like XLOOKUP, LET and LAMBDA</li> <li>Creating artificial intelligence using linear models, ensemble methods, and naïve Bayes</li> <li>Mathematical optimization, including non-linear programming and genetic algorithms</li> <li>Prediction with time-series data and forecasting with exponential smoothing</li> <li>Clustering with k-means, spherical k-means, and graph modularity</li> <li>Quantifying and addressing risk with Monte Carlo simulation</li> <li>Detecting outliers in single or multiple dimensions</li> <li>Statistical programming with R</li> </ul>