TURNING DATA INTO INFORMATION, BETTER DECISIONS, AND STRONGER ORGANIZATIONS
These two authors are world‐class experts on analytics, data management, and data quality; they’ve forgotten more about these topics than most of us will ever know. Their book is pragmatic, understandable, and focused on what really counts. If you want to do data science in any capacity, you need to read it.
Thomas H. Davenport
Distinguished Professor, Babson College and Fellow, MIT Initiative on the Digital Economy
I like your book. The Chapters address problems that have faced Statisticians for generations, updated to reflect today’s issues, such as computational big data.
Sir David Cox
Warden of Nuffield College and Professor of Statistics, Oxford University
I am already in love with your book based on the overview and preface!! What a creative approach! Speaks a lot to your ability to tell a good story – one of the key ways of reasoning for a good data scientist!
Hollylynne S. Lee
Professor, Mathematics and Statistics Education and Faculty Fellow, Friday Institute for Educational Innovation, North Carolina State University
The root causes of business failures typically are management, not technology. In today’s complex and changing digital world, the advice in The Real Work of Data Science is essential. Read it and do it.
John A. Zachman
Chairman – Zachman International and Executive Director – FEAC Institute
If you are wondering what the real challenges and solutions to solving your ‘Big Data’ problem are, this is a must read book. Ron and Tom move past the technology hype and highlight the real issues and opportunities in leveraging data science to the benefit of your organization
Jeff MacMillan
Chief Analytics and Data Officer, Morgan Stanley Wealth Management
Much needed!
Neil Lawrence
Professor of Machine Learning at the University of Sheffield and Machine Learning team manager at Amazon
More than 80% of data science projects fail, either partially or wholly, at the implementation stage. There is a wealth of books on the technical and mechanical aspects of data science, but little to guide data scientists and managers on the holistic integration of data science into organizations in a way that produces success. This well‐written book fills that gap.
Peter Bruce
Founder and Chief Academic Officer, The Institute for Statistics Education
C’est livre est très intéressant et plein de très bonnes choses intelligentes et utiles. Il sera sans nul doute très précieux.
Jean Michel Poggi
Professor of Statistics at Paris‐Descartes University and Mathematics Laboratory, Orsay University, Paris, France,
Past President of the Société Française de Statistique and Vice‐President of the Federation of European National Statistical Societies
I like the very direct and succinct style. You are certainly right on target when you say you can’t stress enough the importance of understanding the real problem. Other of your points in Chapter 1 really hit home, such as data scientists spending more time on data quality than on analysis. (I’m glad they do.) Further, you are absolutely correct that data scientists must translate their results into the language of the decision‐maker. I also recognize the liberal use of anecdotes in the book. For instance, the remarks about Bill Hunter, the ice cream sales, the Pokémon experiment, etc. I personally like this, and I do this in all of my speeches since I think it really hooks the audience.
Barry Nussbaum
Past Chief Statistician, the United States Environmental Protection Agency and Past President of the American Statistical Association
I think this book is excellent for an introductory course in data science. It could be used with students at university level or with professionals in specialist courses.
Luciana Dalla Valle
Lecturer in Statistics and Programme Manager of the MSc Data Science and Business Analytics, School of Computing, Electronics and Mathematics, Plymouth University, UK
The Real Work of Data Science addresses the softer issues of data science that actually decide on the success or failure of any data science initiative. It makes the data science and Chief Analytics Officer roles more understandable and accessible to a wider audience. Choosing the right modeling method is often the key point of discussion in books, although it is just a tiny fraction of the job to be done. This book prepares you for the harsh reality of data science in the real‐world.
Alexander Borek
Global Head of Data & Analytics at Volkswagen Financial Services
Data science is critical for competitiveness, for good government, for correct decisions. But what is data science? Kenett and Redman give, by far, the best introduction to the subject I have seen anywhere. They address the critical questions of formulating the right problem, collecting the right data, doing the right analyses, making the right decisions, and measuring the actual impact of the decisions. This book should become required reading in statistics and computer science departments, business schools, analytics institutes and, most importantly, by all business managers.
A. Blanton Godfrey, Joseph D. Moore
Distinguished University Professor, Wilson College of Textiles, North Carolina State University
This edition first published 2019
© 2019 Ron S. Kenett and Thomas C. Redman
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
The right of Ron S. Kenett and Thomas C. Redman to be identified as the authors of this work has been asserted in accordance with law.
Registered Offices
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
Editorial Office
9600 Garsington Road, Oxford, OX4 2DQ, UK
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.
Limit of Liability/Disclaimer of Warranty
While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
Library of Congress Cataloging‐in‐Publication data has been applied for
ISBN: 9781119570707
Cover Design: Wiley
Cover Image: © enisaksoy/Getty Images
To Sima, our children and their families, and their
wonderful children: Yonatan, Alma, Tomer, Yadin, Aviv,
Gili, Matan, Eden and Ethan
Ron
To my wife Nancy, our six children, and our grandchildren
Tom
Prof. Ron S. Kenett is Chairman of the KPA Group and Senior Research Fellow at the Samuel Neaman Institute, Technion, Haifa, Israel. He is an applied statistician combining expertise in academic, consulting, and business domains. Ron is past president of the Israel Statistical Association and the European Network for Business and Industrial Statistics. He has written more than 250 papers and 14 books on statistical methods and applications. He was awarded the 2013 Greenfield Medal by the English Royal Statistical Society and the 2018 Box Medal by the European Network for Business and Industrial Statistics in recognition of excellence in contributions to the development and application of statistics.
Dr. Thomas C. Redman, “the Data Doc,” President of Data Quality Solutions, helps start‐ups, multinationals, senior executives, chief data officers, and leaders buried deep in their organizations chart their courses to data‐driven futures, with special emphasis on quality and data science. The author of five other books and hundreds of papers, Tom's most important article is “Data's Credibility Problem” (Harvard Business Review, December 2013). He has a PhD in statistics and two patents. Tom lives in Rumson, New Jersey, with his wife, Nancy.
This book has its roots in a chance meeting brought on when Ron responded to an article on data science that Tom published. One short discussion led to another, quickly narrowing to a common theme: we shared the experience that, in order to help companies and organizations become better at exploiting data and statistical analysis, one needs something more than technical brilliance. For both of us, our most successful and impactful projects resulted from other factors, such as understanding the problem, narrowing the focus, delivering simple messages in powerful ways, being in the right spot at the right time, and building the trust of decision‐makers. Conversely, our failures stemmed not from poor technical work but from a failure to connect, on the right issues, with the right people, or in the right way.
We had both written, separately, on some aspects of these topics. Ron has studied how one generates information quality with a framework labeled “InfoQ,” Tom has addressed data quality and became known as “the Data Doc.” We wondered if we could help data scientists who work in companies and other organizations enjoy more and larger successes and endure fewer failures by putting our heads together.
It is no secret that “data,” broadly defined, is all the rage. And “data science,” including traditional statistics, Bayesian statistics, business intelligence, predictive analytics, big data, machine learning, and artificial intelligence (AI) are enjoying the spotlight. There are plenty of great successes, building on a rich tradition of statistics in government and industry, driven by increasing business needs, more data powered by social media, the Internet of Things, and the computer power to analyze it. Iconic new companies include Amazon, Facebook, Google, and Uber. At the same time, there are enormous issues: the Facebook/Cambridge Analytica scandals of early 2018 underscore threats to our privacy (Kenett et al. 2018), many fear that millions of jobs will be lost to artificial intelligence, analytics projects still fail at a high rate, and the tremendous damage that has resulted from some notable “successful” efforts, as described in O'Neil (2016).
Will data and data science power the next great economic miracle? Will they make solid contributions, more positive than negative? Or will they be just another fad confined to the scrap heap of failed ideas? Even worse, will they put our entire social fabric at risk? It is impossible to know.
We do know that data and data science can be truly transformative, improving customer satisfaction, increasing profits, and empowering people – we have seen it with our own eyes. We believe that data scientists have huge roles to play in tipping the scales toward the good in the questions above. This will require incredible commitment, determination, and follow‐through. We encourage data scientists, statisticians, and those who manage them to take up the cause, as we have. We want to do all we can to fully equip them.
In writing the book, we adopted four “personas” as readers. First is Sally, a 31‐year‐old data scientist who works in a midsize department or company. Sally's job involves producing management reports, although she does have some time for teasing insights from ever‐increasing volumes of untrustworthy data. Her title could be any of “data scientist,” “statistician,” “analyst,” “machine learning specialist,” and others. We are well aware that some people see differences between these titles. But (with one exception, below) those distinctions are meaningless for us. Whether you are trained as a statistician, computer scientist, physicist, or engineer, your job is to turn “data into information and better decisions,” as part of our title demands.
Our second reader persona is Divesh, the 50‐year‐old who has the top analytics job within his department, business unit, or company. His title may be “chief analytics officer,” “head of data science,” or something similar. Divesh may have no formal training in data science, but he is a seasoned manager. While Divesh's day job is to manage data science across his department, within his sphere, he also bears special responsibility for the “building stronger organizations” portion of our title.
Brian, a solid industrial statistician, aged 46 and employed as an internal consultant, is our third persona. Brian is simultaneously bemused and threatened by data science, and he sits on the sidelines way too much. We think Brian has much to offer and encourage him to join the effort.
A fourth persona has an outsized impact on data science and this book. It is Elizabeth, who heads up some department, division, even an entire company. Liz hated statistics in college – it was a required course, poorly taught, and not connected to the rest of her studies. She has seen more and more power in data and data science over the last several years and is just beginning to explore what it means for her department. Liz is both excited about the possibilities and fearful that her efforts will fail miserably.
More than anything, Liz's success, or failure, will dictate the future of data science. She can ignore it (and there are plenty of good reasons to do so) or become an increasingly demanding customer. If she fully embraces data and data science, she can transform her department.
Sally, Divesh, and Brian have different needs but share a common theme. Their business is to turn numbers into information and insights. To be useful, their analyses need to guide decisions that carry a positive impact in the workplace. In other words, they need to help Liz succeed.
We packaged our experience in 18 short chapters directly relevant to our four main personas. We do not deal with technical issues but instead focus on the make or break ingredients in data‐driven transformation.
The chapters cover the different steps data scientists take in organizations. We discuss their role as individuals and through their organizational positions. We present lots of models that have helped us, we discuss the integration of hard and soft data in analytic work, and we stress the importance of impact (as opposed to technical excellence). The book also provides a context and opens curtains to landscapes that are not usually explored by most experts in data analysis.
We build on the contributions of statisticians like Box, Breiman, Cox, Deming, Hahn, and Tukey; cognitive psychologists like Kahneman and Tversky; and leaders in other disciplines to address current and future challenges. We also connect theory and applications, past contributions and modern developments, organizational needs and the means to fulfill them.
We've been as direct and to the point as we are able. This book should help you think more broadly about your job. Those seeking cookbook style “how‐tos” will be sadly disappointed. It does provide an overview, benchmarks, and objectives, but you will have to develop your own concrete action plans.
We will be successful if readers take ideas introduced here and apply them in ways that best suit their own skill sets, the needs of decision‐makers they serve, and the cultures of their organizations. Data and analytics can transform organizations for the good – we encourage data scientists and applied statisticians to do their part, to help decision‐makers become more effective, and to keep this transformation on the right track.
This book is accompanied by a companion website:
www.wiley.com/go/kenett‐redman/datascience
The website material includes:
Scan this QR code to visit the companion website.