Details

Responsible Data Science

1. Aufl.

von: Grant Fleming, Peter C. Bruce
25,99 €
Verlag:	Wiley
Format:	PDF
Veröffentl.:	13.04.2021
ISBN/EAN:	9781119741770
Sprache:	englisch
Anzahl Seiten:	304

In den Warenkorb

Als Gutschein

DRM-geschütztes eBook, Sie benötigen z.B. Adobe Digital Editions und eine Adobe ID zum Lesen.

Beschreibungen

Titelbeschreibung

Explore the most serious prevalent ethical issues in data science with this insightful new resource The increasing popularity of data science has resulted in numerous well-publicized cases of bias, injustice, and discrimination. The widespread deployment of “Black box” algorithms that are difficult or impossible to understand and explain, even for their developers, is a primary source of these unanticipated harms, making modern techniques and methods for manipulating large data sets seem sinister, even dangerous. When put in the hands of authoritarian governments, these algorithms have enabled suppression of political dissent and persecution of minorities. To prevent these harms, data scientists everywhere must come to understand how the algorithms that they build and deploy may harm certain groups or be unfair. Responsible Data Science delivers a comprehensive, practical treatment of how to implement data science solutions in an even-handed and ethical manner that minimizes the risk of undue harm to vulnerable members of society. Both data science practitioners and managers of analytics teams will learn how to: <ul> <li>Improve model transparency, even for black box models</li> <li>Diagnose bias and unfairness within models using multiple metrics</li> <li>Audit projects to ensure fairness and minimize the possibility of unintended harm</li> </ul> Perfect for data science practitioners, Responsible Data Science will also earn a spot on the bookshelves of technically inclined managers, software developers, and statisticians.

Inhaltsverzeichnis

Introduction xix Part I Motivation for Ethical Data Science and Background Knowledge 1 Chapter 1 Responsible Data Science 3 The Optum Disaster 4 Jekyll and Hyde 5 Eugenics 7 Galton, Pearson, and Fisher 7 Ties between Eugenics and Statistics 7 Ethical Problems in Data Science Today 9 Predictive Models 10 From Explaining to Predicting 10 Predictive Modeling 11 Setting the Stage for Ethical Issues to Arise 12 Classic Statistical Models 12 Black-Box Methods 14 Important Concepts in Predictive Modeling 19 Feature Selection 19 Model-Centric vs. Data-Centric Models 20 Holdout Sample and Cross-Validation 20 Overfitting 21 Unsupervised Learning 22 The Ethical Challenge of Black Boxes 23 Two Opposing Forces 24 Pressure for More Powerful AI 24 Public Resistance and Anxiety 24 Summary 25 Chapter 2 Background: Modeling and the Black-Box Algorithm 27 Assessing Model Performance 27 Predicting Class Membership 28 The Rare Class Problem 28 Lift and Gains 28 Area Under the Curve 29 AUC vs. Lift (Gains) 31 Predicting Numeric Values 32 Goodness-of-Fit 32 Holdout Sets and Cross-Validation 33 Optimization and Loss Functions 34 Intrinsically Interpretable Models vs. Black-Box Models 35 Ethical Challenges with Interpretable Models 38 Black-Box Models 39 Ensembles 39 Nearest Neighbors 41 Clustering 41 Association Rules 42 Collaborative Filters 42 Artificial Neural Nets and Deep Neural Nets 43 Problems with Black-Box Predictive Models 45 Problems with Unsupervised Algorithms 47 Summary 48 Chapter 3 The Ways AI Goes Wrong, and the Legal Implications 49 AI and Intentional Consequences by Design 50 Deepfakes 50 Supporting State Surveillance and Suppression 51 Behavioral Manipulation 52 Automated Testing to Fine-Tune Targeting 53 AI and Unintended Consequences 55 Healthcare 56 Finance 57 Law Enforcement 58 Technology 60 The Legal and Regulatory Landscape around AI 61 Ignorance Is No Defense: AI in the Context of Existing Law and Policy 63 A Finger in the Dam: Data Rights, Data Privacy, and Consumer Protection Regulations 64 Trends in Emerging Law and Policy Related to AI 66 Summary 69 Part II The Ethical Data Science Process 71 Chapter 4 The Responsible Data Science Framework 73 Why We Keep Building Harmful AI 74 Misguided Need for Cutting-Edge Models 74 Excessive Focus on Predictive Performance 74 Ease of Access and the Curse of Simplicity 76 The Common Cause 76 The Face Thieves 78 An Anatomy of Modeling Harms 79 The World: Context Matters for Modeling 80 The Data: Representation Is Everything 83 The Model: Garbage In, Danger Out 85 Model Interpretability: Human Understanding for Superhuman Models 86 Efforts Toward a More Responsible Data Science 89 Principles Are the Focus 90 Nonmaleficence 90 Fairness 90 Transparency 91 Accountability 91 Privacy 92 Bridging the Gap Between Principles and Practice with the Responsible Data Science (RDS) Framework 92 Justification 94 Compilation 94 Preparation 95 Modeling 96 Auditing 96 Summary 97 Chapter 5 Model Interpretability: The What and the Why 99 The Sexist Résumé Screener 99 The Necessity of Model Interpretability 101 Connections Between Predictive Performance and Interpretability 103 Uniting (High) Model Performance and Model Interpretability 105 Categories of Interpretability Methods 107 Global Methods 107 Local Methods 113 Real-World Successes of Interpretability Methods 113 Facilitating Debugging and Audit 114 Leveraging the Improved Performance of Black-Box Models 116 Acquiring New Knowledge 116 Addressing Critiques of Interpretability Methods 117 Explanations Generated by Interpretability Methods Are Not Robust 118 Explanations Generated by Interpretability Methods Are Low Fidelity 120 The Forking Paths of Model Interpretability 121 The Four-Measure Baseline 122 Building Our Own Credit Scoring Model 124 Using Train-Test Splits 125 Feature Selection and Feature Engineering 125 Baseline Models 127 The Importance of Making Your Code Work for Everyone 129 Execution Variability 129 Addressing Execution Variability with Functionalized Code 130 Stochastic Variability 130 Addressing Stochastic Variability via Resampling 130 Summary 133 Part III EDS in Practice 135 Chapter 6 Beginning a Responsible Data Science Project 137 How the Responsible Data Science Framework Addresses the Common Cause 138 Datasets Used 140 Regression Datasets—Communities and Crime 140 Classification Datasets—COMPAS 140 Common Elements Across Our Analyses 141 Project Structure and Documentation 141 Project Structure for the Responsible Data Science Framework: Everything in Its Place 142 Documentation: The Responsible Thing to Do 145 Beginning a Responsible Data Science Project 151 Communities and Crime (Regression) 151 Justification 151 Compilation 154 Identifying Protected Classes 157 Preparation—Data Splitting and Feature Engineering 159 Datasheets 161 COMPAS (Classification) 164 Justification 164 Compilation 166 Identifying Protected Classes 168 Preparation 169 Summary 172 Chapter 7 Auditing a Responsible Data Science Project 173 Fairness and Data Science in Practice 175 The Many Different Conceptions of Fairness 175 Different Forms of Fairness Are Trade-Offs with Each Other 177 Quantifying Predictive Fairness Within a Data Science Project 179 Mitigating Bias to Improve Fairness 185 Preprocessing 185 In-processing 186 Postprocessing 186 Classification Example: COMPAS 187 Prework: Code Practices, Modeling, and Auditing 187 Justification, Compilation, and Preparation Review 189 Modeling 191 Auditing 200 Per-Group Metrics: Overall 200 Per-Group Metrics: Error 202 Fairness Metrics 204 Interpreting Our Models: Why Are They Unfair? 207 Analysis for Different Groups 209 Bias Mitigation 214 Preprocessing: Oversampling 214 Postprocessing: Optimizing Thresholds Automatically 218 Postprocessing: Optimizing Thresholds Manually 219 Summary 223 Chapter 8 Auditing for Neural Networks 225 Why Neural Networks Merit Their Own Chapter 227 Neural Networks Vary Greatly in Structure 227 Neural Networks Treat Features Differently 229 Neural Networks Repeat Themselves 231 A More Impenetrable Black Box 232 Baseline Methods 233 Representation Methods 233 Distillation Methods 234 Intrinsic Methods 235 Beginning a Responsible Neural Network Project 236 Justification 236 Moving Forward 239 Compilation 239 Tracking Experiments 241 Preparation 244 Modeling 245 Auditing 247 Per-Group Metrics: Overall 247 Per-Group Metrics: Unusual Definitions of “False Positive” 248 Fairness Metrics 249 Interpreting Our Models: Why Are They Unfair? 252 Bias Mitigation 253 Wrap-Up 255 Auditing Neural Networks for Natural Language Processing 258 Identifying and Addressing Sources of Bias in NLP 258 The Real World 259 Data 260 Models 261 Model Interpretability 262 Summary 262 Chapter 9 Conclusion 265 How Can We Do Better? 267 The Responsible Data Science Framework 267 Doing Better As Managers 269 Doing Better As Practitioners 270 A Better Future If We Can Keep It 271 Index 273

Autorenportrait

GRANT FLEMING is a Data Scientist at Elder Research Inc. His professional focus is on machine learning for social science applications, model interpretability, civic technology, and building software tools for reproducible data science.PETER BRUCE is the Senior Learning Officer at Elder Research, Inc., author of several best-selling texts on data science, and Founder of the Institute for Statistics Education at Statistics.com, an Elder Research Company.

Back cover copy

A PRACTICAL GUIDE TO IDENTIFYING AND REDUCING BIAS AND UNFAIRNESS IN DATA SCIENCERapid advancements in data science are causing increasing alarm around the world as governments, companies, other organizations, and individuals put new technologies to uses that were unimaginable just a decade ago. Medicine, finance, criminal justice, law enforcement, communication, marketing and other functions are all being transformed by the implementation of techniques and methods made possible by progressively more obscure manipulations of larger and larger data sets. Almost every day, new stories of AI gone awry appear. What can be done to avoid these issues?Responsible Data Science is an insightful and practical exploration of the ethical issues that arise when the newest AI technologies are applied to the largest and most sensitive data sets on the planet. The book walks you through how to implement and audit cutting-edge AI models in ways that minimize the risks of unanticipated harms. It combines detailed technical analysis with perceptive social observations to offer data scientists a real-world perspective on their field.The inability to explain how an artificial intelligence model uses inputs can jeopardize the willingness of regulators to even consider whether these technologies comply with existing and future regulatory and legal requirements. In this book you’ll learn how to improve the interpretability of AI models, and audit them to reduce bias and unfairness, thereby inspiring greater confidence in the minds of customers, employees, regulators, legislators and other stakeholders.Perfect for data science practitioners, statisticians, software engineers, and technically aware managers and solutions architects, Responsible Data Science will also earn a place in the libraries of regulators, lawyers, and policy makers whose decisions will determine how and when data solutions are implemented.This groundbreaking book also covers:<ul><li>The various types of ethical challenges confronting modern day data scientists</li><li>How the adoption of “black box” models can aggravate issues of model transparency, bias, and fairness</li><li>How moral concepts like fairness translate (or fail to translate) into a modeling context</li><li>How model-agnostic methods can be used to make models more interpretable, identify issues of bias, and mitigate the bias discovered</li></ul>