Cover Page

THE REAL WORK OF DATA SCIENCE

TURNING DATA INTO INFORMATION, BETTER DECISIONS, AND STRONGER ORGANIZATIONS



Ron S. Kenett

Ra'anana, Israel


Thomas C. Redman

Rumson, NJ, USA





No alt text required.

Praise for The Real Work of Data Science

These two authors are world‐class experts on analytics, data management, and data quality; they’ve forgotten more about these topics than most of us will ever know. Their book is pragmatic, understandable, and focused on what really counts. If you want to do data science in any capacity, you need to read it.

Thomas H. Davenport
Distinguished Professor, Babson College and Fellow, MIT Initiative on the Digital Economy

I like your book. The Chapters address problems that have faced Statisticians for generations, updated to reflect today’s issues, such as computational big data.

Sir David Cox
Warden of Nuffield College and Professor of Statistics, Oxford University

I am already in love with your book based on the overview and preface!! What a creative approach! Speaks a lot to your ability to tell a good story – one of the key ways of reasoning for a good data scientist!

Hollylynne S. Lee
Professor, Mathematics and Statistics Education and Faculty Fellow, Friday Institute for Educational Innovation, North Carolina State University

The root causes of business failures typically are management, not technology. In today’s complex and changing digital world, the advice in The Real Work of Data Science is essential. Read it and do it.

John A. Zachman
Chairman – Zachman International and Executive Director – FEAC Institute

If you are wondering what the real challenges and solutions to solving your ‘Big Data’ problem are, this is a must read book. Ron and Tom move past the technology hype and highlight the real issues and opportunities in leveraging data science to the benefit of your organization

Jeff MacMillan
Chief Analytics and Data Officer, Morgan Stanley Wealth Management

Much needed!

Neil Lawrence
Professor of Machine Learning at the University of Sheffield and Machine Learning team manager at Amazon

More than 80% of data science projects fail, either partially or wholly, at the implementation stage. There is a wealth of books on the technical and mechanical aspects of data science, but little to guide data scientists and managers on the holistic integration of data science into organizations in a way that produces success. This well‐written book fills that gap.

Peter Bruce
Founder and Chief Academic Officer, The Institute for Statistics Education

C’est livre est très intéressant et plein de très bonnes choses intelligentes et utiles. Il sera sans nul doute très précieux.

Jean Michel Poggi
Professor of Statistics at Paris‐Descartes University and Mathematics Laboratory, Orsay University, Paris, France,

Past President of the Société Française de Statistique and Vice‐President of the Federation of European National Statistical Societies

I like the very direct and succinct style. You are certainly right on target when you say you can’t stress enough the importance of understanding the real problem. Other of your points in Chapter 1 really hit home, such as data scientists spending more time on data quality than on analysis. (I’m glad they do.) Further, you are absolutely correct that data scientists must translate their results into the language of the decision‐maker. I also recognize the liberal use of anecdotes in the book. For instance, the remarks about Bill Hunter, the ice cream sales, the Pokémon experiment, etc. I personally like this, and I do this in all of my speeches since I think it really hooks the audience.

Barry Nussbaum
Past Chief Statistician, the United States Environmental Protection Agency and Past President of the American Statistical Association

I think this book is excellent for an introductory course in data science. It could be used with students at university level or with professionals in specialist courses.

Luciana Dalla Valle
Lecturer in Statistics and Programme Manager of the MSc Data Science and Business Analytics, School of Computing, Electronics and Mathematics, Plymouth University, UK

The Real Work of Data Science addresses the softer issues of data science that actually decide on the success or failure of any data science initiative. It makes the data science and Chief Analytics Officer roles more understandable and accessible to a wider audience. Choosing the right modeling method is often the key point of discussion in books, although it is just a tiny fraction of the job to be done. This book prepares you for the harsh reality of data science in the real‐world.

Alexander Borek
Global Head of Data & Analytics at Volkswagen Financial Services

Data science is critical for competitiveness, for good government, for correct decisions. But what is data science? Kenett and Redman give, by far, the best introduction to the subject I have seen anywhere. They address the critical questions of formulating the right problem, collecting the right data, doing the right analyses, making the right decisions, and measuring the actual impact of the decisions. This book should become required reading in statistics and computer science departments, business schools, analytics institutes and, most importantly, by all business managers.

A. Blanton Godfrey, Joseph D. Moore
Distinguished University Professor, Wilson College of Textiles, North Carolina State University



To Sima, our children and their families, and their
wonderful children: Yonatan, Alma, Tomer, Yadin, Aviv,
Gili, Matan, Eden and Ethan

                        Ron


To my wife Nancy, our six children, and our grandchildren

                        Tom

About the Authors

Prof. Ron S. Kenett is Chairman of the KPA Group and Senior Research Fellow at the Samuel Neaman Institute, Technion, Haifa, Israel. He is an applied statistician combining expertise in academic, consulting, and business domains. Ron is past president of the Israel Statistical Association and the European Network for Business and Industrial Statistics. He has written more than 250 papers and 14 books on statistical methods and applications. He was awarded the 2013 Greenfield Medal by the English Royal Statistical Society and the 2018 Box Medal by the European Network for Business and Industrial Statistics in recognition of excellence in contributions to the development and application of statistics.

Dr. Thomas C. Redman, “the Data Doc,” President of Data Quality Solutions, helps start‐ups, multinationals, senior executives, chief data officers, and leaders buried deep in their organizations chart their courses to data‐driven futures, with special emphasis on quality and data science. The author of five other books and hundreds of papers, Tom's most important article is “Data's Credibility Problem” (Harvard Business Review, December 2013). He has a PhD in statistics and two patents. Tom lives in Rumson, New Jersey, with his wife, Nancy.

Preface

This book has its roots in a chance meeting brought on when Ron responded to an article on data science that Tom published. One short discussion led to another, quickly narrowing to a common theme: we shared the experience that, in order to help companies and organizations become better at exploiting data and statistical analysis, one needs something more than technical brilliance. For both of us, our most successful and impactful projects resulted from other factors, such as understanding the problem, narrowing the focus, delivering simple messages in powerful ways, being in the right spot at the right time, and building the trust of decision‐makers. Conversely, our failures stemmed not from poor technical work but from a failure to connect, on the right issues, with the right people, or in the right way.

We had both written, separately, on some aspects of these topics. Ron has studied how one generates information quality with a framework labeled “InfoQ,” Tom has addressed data quality and became known as “the Data Doc.” We wondered if we could help data scientists who work in companies and other organizations enjoy more and larger successes and endure fewer failures by putting our heads together.

Fad, Trend, or Fundamental Transformation?

It is no secret that “data,” broadly defined, is all the rage. And “data science,” including traditional statistics, Bayesian statistics, business intelligence, predictive analytics, big data, machine learning, and artificial intelligence (AI) are enjoying the spotlight. There are plenty of great successes, building on a rich tradition of statistics in government and industry, driven by increasing business needs, more data powered by social media, the Internet of Things, and the computer power to analyze it. Iconic new companies include Amazon, Facebook, Google, and Uber. At the same time, there are enormous issues: the Facebook/Cambridge Analytica scandals of early 2018 underscore threats to our privacy (Kenett et al. 2018), many fear that millions of jobs will be lost to artificial intelligence, analytics projects still fail at a high rate, and the tremendous damage that has resulted from some notable “successful” efforts, as described in O'Neil (2016).

Will data and data science power the next great economic miracle? Will they make solid contributions, more positive than negative? Or will they be just another fad confined to the scrap heap of failed ideas? Even worse, will they put our entire social fabric at risk? It is impossible to know.

We do know that data and data science can be truly transformative, improving customer satisfaction, increasing profits, and empowering people – we have seen it with our own eyes. We believe that data scientists have huge roles to play in tipping the scales toward the good in the questions above. This will require incredible commitment, determination, and follow‐through. We encourage data scientists, statisticians, and those who manage them to take up the cause, as we have. We want to do all we can to fully equip them.

Data Scientists and Chief Analytics Officers

In writing the book, we adopted four “personas” as readers. First is Sally, a 31‐year‐old data scientist who works in a midsize department or company. Sally's job involves producing management reports, although she does have some time for teasing insights from ever‐increasing volumes of untrustworthy data. Her title could be any of “data scientist,” “statistician,” “analyst,” “machine learning specialist,” and others. We are well aware that some people see differences between these titles. But (with one exception, below) those distinctions are meaningless for us. Whether you are trained as a statistician, computer scientist, physicist, or engineer, your job is to turn “data into information and better decisions,” as part of our title demands.

Our second reader persona is Divesh, the 50‐year‐old who has the top analytics job within his department, business unit, or company. His title may be “chief analytics officer,” “head of data science,” or something similar. Divesh may have no formal training in data science, but he is a seasoned manager. While Divesh's day job is to manage data science across his department, within his sphere, he also bears special responsibility for the “building stronger organizations” portion of our title.

Brian, a solid industrial statistician, aged 46 and employed as an internal consultant, is our third persona. Brian is simultaneously bemused and threatened by data science, and he sits on the sidelines way too much. We think Brian has much to offer and encourage him to join the effort.

A fourth persona has an outsized impact on data science and this book. It is Elizabeth, who heads up some department, division, even an entire company. Liz hated statistics in college – it was a required course, poorly taught, and not connected to the rest of her studies. She has seen more and more power in data and data science over the last several years and is just beginning to explore what it means for her department. Liz is both excited about the possibilities and fearful that her efforts will fail miserably.

More than anything, Liz's success, or failure, will dictate the future of data science. She can ignore it (and there are plenty of good reasons to do so) or become an increasingly demanding customer. If she fully embraces data and data science, she can transform her department.

Introduction to the Book

Sally, Divesh, and Brian have different needs but share a common theme. Their business is to turn numbers into information and insights. To be useful, their analyses need to guide decisions that carry a positive impact in the workplace. In other words, they need to help Liz succeed.

We packaged our experience in 18 short chapters directly relevant to our four main personas. We do not deal with technical issues but instead focus on the make or break ingredients in data‐driven transformation.

The chapters cover the different steps data scientists take in organizations. We discuss their role as individuals and through their organizational positions. We present lots of models that have helped us, we discuss the integration of hard and soft data in analytic work, and we stress the importance of impact (as opposed to technical excellence). The book also provides a context and opens curtains to landscapes that are not usually explored by most experts in data analysis.

We build on the contributions of statisticians like Box, Breiman, Cox, Deming, Hahn, and Tukey; cognitive psychologists like Kahneman and Tversky; and leaders in other disciplines to address current and future challenges. We also connect theory and applications, past contributions and modern developments, organizational needs and the means to fulfill them.

We've been as direct and to the point as we are able. This book should help you think more broadly about your job. Those seeking cookbook style “how‐tos” will be sadly disappointed. It does provide an overview, benchmarks, and objectives, but you will have to develop your own concrete action plans.

We will be successful if readers take ideas introduced here and apply them in ways that best suit their own skill sets, the needs of decision‐makers they serve, and the cultures of their organizations. Data and analytics can transform organizations for the good – we encourage data scientists and applied statisticians to do their part, to help decision‐makers become more effective, and to keep this transformation on the right track.

About the Companion Website

This book is accompanied by a companion website:

www.wiley.com/go/kenett‐redman/datascience web

The website material includes:

  • A List of Useful Links

Scan this QR code to visit the companion website.

QR code