Cover Page

mind+machine

A Decision Model for Optimizing and Implementing Analytics

Marc Vollenweider

Wiley Logo

To my wife Gabi and our children Michèle,
Alexandra, Eric, and Mike

Preface

Thank you for buying this book.

In 2015, after 15 years of operations in the field of research and analytics, we decided to adopt the notion of mind+machine at Evalueserve. We believe this marriage of the perceptive power of the human brain with the benefits of automation is essential because neither mind nor machine alone will be able to handle the complexities of analytics in the future.

The editorial team at John Wiley & Sons approached me in November 2015 to ask if I would like to write a book on how our mind+machine approach could help with the management of information-heavy processes—a topic that is of increasing interest to companies worldwide. We got very positive feedback from clients, friends, and colleagues on the idea, and decided to go ahead.

Mind+Machine is for generalist mainstream middle and top managers in business functions such as sales, marketing, procurement, R&D, supply chain, and corporate support functions, particularly in business-to-business (B2B) and B2C industries. We're writing for the hopeful beneficiaries and end users of analytics, and for people who might need to make decisions about analytics, now or in the future. The book is not a technical text primarily addressed to data scientists—although I firmly believe that even those specialists have something to learn about the primary problem in generating return on investment (ROI) from analytics.

We won't be looking at super-advanced but rare analytics use cases—there are specialized textbooks for those. Instead, we're looking at the efficient frontier, offering practical help on dealing with the logistics of managing and improving decision-making support and getting positive ROI at the same time.

After reading this book, you should know about key issues in the value chain of mind+machine in analytics, and be in a position to ask your data scientists, IT specialists, and vendors the right questions. You should understand the options and approaches available to you before you spend millions of dollars on a new proposal. You'll learn some useful things to demystify the world of analytics.

We're also proposing a novel approach, the Use Case Methodology (UCM), to give you a set of tangible and tested tools to make your life easier.

We've included 39 detailed case studies and plenty of real-life anecdotes to illustrate the applications of mind+machine. I'm sure you'll recognize some of your own experiences. And you'll see that you're far from alone in your quest to understand analytics.

What makes me want to put these ideas about the problems and solutions to analytics issues out in the world is conversations like these two.

The first words to me from a very senior line manager in a B2B corporation:

“Marc, is this meeting going to be about big data? If so, I'll stop it right here. Vendors are telling me that I need to install a data lake and hire lots of increasingly rare and expensive statisticians and data scientists. My board is telling me that I need to do ‘something' in big data. It all sounds unjustifiably expensive and complex. I just want to make sure that my frontline people are going to get what they need in time. I keep hearing from other companies that after an initial burst of analytics activity, real life caught up with them, the line guys are still complaining about delays, and the CFO is asking a lot of questions about the spend on big data.”

During a meeting with the COO of an asset manager to define the scope of a project:

“We do thousands of pitches to pension funds and other institutional investors every year. We have over 25 different data sources with quantitative data and qualitative information, with lots of regional flavors. However, we still put the pitches together manually and get the sign-offs from the legal department by e-mail. There must be a smarter way of doing this.”

Why is analytics becoming such a controversial and challenging world? Why are managers either daunted by overhyped new initiatives and processes that they don't understand or frustrated by the feeling that there should be a better way to do something, given all this talk about better, bigger, brighter analytics?

Typical line managers want to get the right decision-making support to the right people at the right time in the right format. The proliferating number of analytics use cases and available data sets is not matched by an expansion in individuals' and companies' capacities to mentally and logistically absorb the information. Additionally, existing and new compliance requirements are piling up at a remarkable speed, especially in industries with a high regulatory focus, such as financial services and health care.

Analytics itself is not truly the issue. In most cases, the problem is the logistics of getting things done in organizations: defining the workflow and getting it executed efficiently; making decisions on internal alignment, the complexities of getting IT projects done, and other organizational hurdles that hamper the progress. These complexities slow things down or make projects diverge from their original objectives, so that the actual beneficiaries of the analytics (e.g., the key account manager or the procurement manager in the field) don't get what they need in time.

Many other issues plague the world of analytics: the proliferation of unintuitive jargon about data lakes and neural networks, the often-overlooked psychology of data analytics that drives companies to hold too dearly to the idea of the power of data and makes the implementation more complex than required, and the marketing hype engines making promises that no technology can fulfill.

Based on hundreds of client interactions at Evalueserve and with my former colleagues in the strategy consulting world, it became increasingly clear that there is a strong unmet need in the general managerial population for a simplified framework to enable efficient and effective navigation of information-heavy decision-support processes. Simplicity should always win over complex and nontransparent processes—the analytics space is no exception.

I want to demystify analytics. I'll start with the fundamental observation that terms such as big data and artificial intelligence are getting so much attention in the media that the bricks-and-mortar topics of everyday analytics aren't getting the attention they deserve: topics such as problem definition, data gathering, cleansing, analysis, visualization, dissemination, and knowledge management. Applying big data to every analytics problem would be like taking one highly refined chef's tool—a finely balanced sushi knife, for example—and trying to use it for every task. While very useful big data use cases have emerged in several fields, they represent maybe 5 percent of all of the billions of analytics use cases.

What are the other 95 percent of use cases about? Small data. It is amazing how many analytics use cases require very little data to achieve a lot of impact. My favorite use case that illustrates the point is one where just 800 bits of information saved an investment bank a recurring annual cost of USD 1,000,000. We will discuss the details of this use case in Part I.

Granted, not every use case performs like that, but I want to illustrate the point that companies have lots of opportunities to analyze their existing data with very simple tools, and that there is very little correlation between ROI and the size of the data set.

Mind+Machine addresses end-to-end, information-heavy processes that support decision making or produce information-based output, such as sales pitches or research and data products, either for internal recipients or for external clients or customers. This includes all types of data and information: qualitative and quantitative; financial, business, and operational; static and dynamic; big and small; structured and unstructured.

The concept of mind+machine addresses how the human mind collaborates with machines to improve productivity, time to market, and quality, or to create new capabilities that did not exist before. This book is not about the creation of physical products or using physical machines and robots as in an Industry 4.0 model. Additionally, we will look at the full end-to-end value chain of analytics, which is far broader than just solving the analytics problem or getting some data. And finally, we will ask how to ensure that analytics helps us make money and satisfy our clients.

In Part I, we'll analyze the current state of affairs in analytics, dispelling the top 12 fallacies that have taken over the perception of analytics. It is surprising how entrenched these fallacies have become in the media and even in very senior management circles. It is hoped that Part I will give you some tools to deal with the marketing hype, senior management expectations, and the jargon of the field. Part I also contains the 800 bits use case. I'm sure you can't wait to read the details.

In Part II, we'll examine the key trends affecting analytics and driving positive change. These trends are essentially good news for most users and decision makers in the field. It sets the stage for a dramatic simplification of processes requiring less IT spend, shorter development cycles, increasingly user-friendly interfaces, and the basis for new and profitable use cases. We'll examine key questions, including:

In Part III, we will look at best practices in mind+machine. We will look at the end-to-end value chain of analytics via the Use Case Methodology (UCM), focusing on how to get things done. You will find practical recommendations on how to design and manage individual use cases as well as how to govern whole portfolios of use cases.

Some of the key questions we'll address are:

However, just looking at the individual use cases is not enough, as whole portfolios of use cases need to be managed. Therefore, this part will also answer the following questions:

At the end of Part III you should be in a position to address the main challenges of mind+machine, both for individual use cases and for portfolios of use cases.

Throughout the book I use numerous analogies from the non-nerd world to make the points, trying to avoid too much specialist jargon. Some of them might be a bit daring, but I hope they are going to be fun reading, loosening up the left-brained topic of analytics. If I could make you smile a few times while reading this book, my goal will have been achieved.

I'm glad to have you with me for this journey through the world of mind+machine. Thank you for choosing me as your guide. Let us begin!

Acknowledgments

My heartfelt thanks to Evalueserve's loyal clients, employees, and partner firms, without whose contributions this book would not have been possible; to our four external contributors and partners: Neil Gardiner of Every Interaction, Michael Müller of Acrea, Alan Day of State of Flux, and Stephen Taylor of Stream Financial; to our brand agency Earnest for their thought leadership in creating our brand; to all the Evalueserve teams and the teams of our partner firms MP Technology, Every Interaction, Infusion, Earnest, and Acrea for creating and positioning InsightBee and other mind+machine platforms; to the creators, owners, and authors of all the use cases in this book and their respective operations teams; to Jean-Paul Ludig, who helped me keep the project on track; to Derek and Seven Victor for their incredible help in editing the book; to Evalueserve's marketing team; to the Evalueserve board and management team for taking a lot of operational responsibilities off my shoulders, allowing me to write this book; to John Wiley & Sons for giving me this opportunity; to Ursula Hueby for keeping my logistics on track during all these years; to Ashish Gupta, our former COO, for being a friend and helping build the company from the very beginning; to Alok Aggarwal for co-founding the company; to his wife Sangeeta Aggarwal for introducing us; and above all to my wonderful wife Gabi for supporting me during all these years, actively participating in all of Evalueserve's key events, being a great partner for both everyday life and grand thought experiments, and for inspiring me to delve into the psychology of those involved at all levels of mind+machine analytics.

—Marc Vollenweider

List of Use Cases

  1. Innovation Analytics: Nascent Industry Growth Index
  2. Cross-Sell Analytics: Opportunity Dashboard
  3. Subscription Management: “The 800 Bits Use Case”
  4. Innovation Scouting: Finding Suitable Innovations
  5. Virtual Data Lake: A Use Case by Stream Financial
  6. InsightBee: The Last Mile
  7. Market Intelligence Solution Suite: Build for Flexibility
  8. Intellectual Property: Managing Value-Added IP Alerts
  9. Investment Banking Analytics: Logo Repository
  10. Managing Indirect Procurement Market Intelligence: Efficient Procurement
  11. InsightBee Procurement Intelligence: Efficient Management of Procurement Risks
  12. Brand Perception Analytics: Assessing Opinions in the Digital Realm
  13. Wealth Management: InsightBee for Independent Financial Advisors
  14. Analytics in Internet of Things: Benchmarking Machines Using Sensor Data
  15. InsightBee: Market Intelligence via Pay-as-You-Go
  16. Virtual Analyst: Intelligent Pricing & Dynamic Discounting
  17. InsightBee Sales Intelligence: Proactive Identification of New Sales Opportunities
  18. Customer Analytics: Aiding Go-to-Market Strategy
  19. Social Insights: Asian Language Social Media Insights
  20. Managing Research Flow: Workflow and Self-Service
  21. Automation in Asset Management: Fund Fact Sheets
  22. Investment Banking: Automating Routine Tasks
  23. Mind–Machine Interface: Game Controller
  24. Financial Benchmark Analytics: Environmental Index
  25. Industry Sector Update: Marketing Presentations
  26. Investment Banking: A Global Offshore Research Function
  27. Financial Benchmark Analytics: Index Reporting
  28. Energy Retailer: Competitive Pricing Analytics
  29. Intellectual Property: Identifying and Managing IP Risk
  30. Market and Customer Intelligence: Market Inventories
  31. Customer Churn Analytics: B2B Dealer Network
  32. Preventive Maintenance: Analyzing and Predicting Network Failures
  33. Supply Chain Framework: Bottlenecks Identification
  34. Spend Analytics: Category Planning Tool
  35. Predictive Analytics: Cross-Selling Support
  36. Operating Excellence Analytics: Efficiency Index
  37. Financial Services: Investment Banking Studio
  38. Sales Enablement: Account-Based Marketing Support
  39. InsightBee: A UX Design Study by Every Interaction

Part I
The Top 12 Fallacies about Mind+Machine

The number of incredible opportunities with great potential for mind+ machine is large and growing. Many companies have already begun successfully leveraging this potential, building whole digital business models around smart minds and effective machines. Despite the potential for remarkable return on investment (ROI), there are pitfalls—particularly if you fall into the trap of believing some of the common wisdoms in analytics, which are exposed as fallacies on closer examination.

Some vendors might not agree with the view that current approaches have serious limitations, but the world of analytics is showing some clear and undisputable symptoms that all is not well. To ensure you can approach mind+machine successfully, I want to arm you with insights into the traps and falsehoods you will very likely encounter.

First, let's make sure we all know what successful analytics means: the delivery of the right insight to the right decision makers at the right time and in the right format. Anything else means a lessened impact—which is an unsatisfactory experience for all involved.

The simplest analogy is to food service. Success in a restaurant means the food is tasty, presented appropriately, and delivered to the table on time. It's not enough to have a great chef if the food doesn't reach the table promptly. And the most efficient service won't save the business if the food is poor quality or served with the wrong utensils.

The impact on a business from analytics should be clear and strong. However, many organizations struggle, spending millions or even tens of millions on their analytics infrastructure but failing to receive the high-quality insights when they are needed in a usable form—and thus failing to get the right return on their investments. Why is that?

Analytics serves the fundamental desire to support decisions with facts and data. In the minds of many managers, it's a case of the more, the better. And there is certainly no issue with finding data! The rapid expansion in the availability of relatively inexpensive computing power and storage has been matched by the unprecedented proliferation of information sources. There is a temptation to see more data combined with more computing power as the sole solution to all analytics problems. But the human element cannot be underestimated.

I vividly remember my first year at McKinsey Zurich. It was 1990, and one of my first projects was a strategy study in the weaving machines market. I was really lucky, discovering around 40 useful data points and some good qualitative descriptions in the 160-page analyst report procured by our very competent library team. We also conducted 15 qualitative interviews and found another useful source.

By today's standards, the report provided a combined study-relevant data volume of 2 to 3 kilobytes. We used this information to create a small but robust model in Lotus 1-2-3 on a standard laptop. Those insights proved accurate: in 2000, I came across the market estimates again and found that we had been only about 5% off.

Granted, this may have been luck, but my point is that deriving valuable insight—finding the “so what?”—required thought, not just the mass of data and raw computing power that many see as the right way to do analytics. Fallacies like this and the ones I outline in this part of the book are holding analytics back from achieving its full potential.

Fallacy #1
Big Data Solves Everything

From Google to start-up analytics firms, many companies have successfully implemented business models around the opportunities offered by big data. The growing number of analytics use cases include media streaming, business-to-consumer (B2C) marketing, risk and compliance in financial services, surveillance and security in the private sector, social media monitoring, and preventive maintenance strategies (Figure I.1). However, throwing big data at every analytics use case isn't always the way to generate the best return on investment (ROI).

Figure depicting areas of big data impact that is classified into B2C (left), B2B (middle), and public sector (right). B2C comprises consumer insight and advertising, search and information, sales and e-commerce, supply chain and logistics, customer service and maintenance, risk and compliance, Internet of things, and infrastructure. B2B comprises manufacturing, Internet of things, supply chain and logistics, R&D, customer services and maintenance, risk and compliance, and infrastructure. Public sector comprises security and surveillance, law enforcement, traffic, healthcare, science, tax, and infrastructure.

Figure I.1 Areas of Big Data Impact

Before we explore the big data fallacy in detail, we need to define analytics use case, a term you'll encounter a lot in this book. Here is a proposed definition:

“An analytics use case is the end-to-end analytics support solution applied once or repeatedly to a single business issue faced by an end user or homogeneous group of end users who need to make decisions, take actions, or deliver a product or service on time based on the insights delivered.”

What are the implications of this definition? First and foremost, use cases are really about the end users and their needs, not about data scientists, informaticians, or analytics vendors. Second, the definition does not specify the data as small or big, qualitative or quantitative, static or dynamic—the type, origin, and size of the data input sets are open. Whether humans or machines or a combination thereof deliver the solution is also not defined. However, it is specific on the need for timely insights and on the end-to-end character of the solution, which means the complete workflow from data creation to delivery of the insights to the decision maker.

Now, getting back to big data: the list of big data use cases has grown significantly over the past decade and will continue to grow. With the advent of social media and the Internet of Things, we are faced with a vast number of information sources, with more to come. Continuous data streams are becoming increasingly prevalent. As companies offering big data tools spring up like mushrooms, people are dreaming up an increasing number of analytics possibilities.

One of the issues with talking about big data, or indeed small data, is the lack of a singular understanding of what the term means. It's good hype in action: an attractive name with a fuzzy definition. I found no less than 12 different definitions of big data while researching this book! I'm certainly not going to list all of them, but I can help you understand them by categorizing them into two buckets: the geek's concept and the anthropologist's view.

Broadly speaking, tech geeks define big data in terms of volumes; velocity (speed); variety (types include text, voice, and video); structure (which can mean structured, such as tables and charts, or unstructured, such as user comments from social media channels); variability over time; and veracity (i.e., the level of quality assurance). There are two fundamental problems with this definition. First, nobody has laid down any commonly accepted limits for what counts as big or small, obviously because this is a highly moving target, and second, there is no clear “so what?” from this definition. Why do all of these factors matter to the end user when they are all so variable?

That brings us to the anthropologist's view, which focuses on the objective. Wikipedia provides an elegant definition that expresses the ambiguity, associated activities, and ultimate objective:

Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying, updating and information privacy. The term often refers simply to the use of predictive analytics or certain other advanced methods to extract value from data, and seldom to a particular size of data set. Accuracy in big data may lead to more confident decision making, and better decisions can result in greater operational efficiency, cost reduction and reduced risk.

High-ROI use cases for big data existed before the current hype. Examples are B2C marketing analytics and advertising, risk analytics, and fraud detection. They've been proven in the market and have consistently delivered value. There are also use cases for scientific research and for national security and surveillance, where ROI is hard to measure but there is a perceived gain in knowledge and security level (although this latter gain is often debated).

We've added a collection of use cases throughout this book to help give you insight into the real-world applications of what you're learning. They all follow the same format to help you quickly find the information of greatest interest to you.

Figure depicting nascent industry growth index. A graphical representation where mind intensity (%) is plotted on the y-axis on a scale of 0–100 and machine intensity (%) on the x-axis on a scale of 0–100. The straight line in the graph representing the variation of mind intensity with machine intensity at 0, 1, and 2 years.
The upper part in the figure depicting the components of machine. These components: analysis (5), productivity (5), workflow (3), dissemination (2), and knowledge management (3) are represented by vertical bars. The lower part in the figure depicting the components of mind. These components: project management (2), business acumen (4), analysis (4), insight (4), and innovation (5) are represented by vertical bars. A bar graphical representation where mind intensity (FTEs) is plotted on the y-axis and time (months) on the x-axis on a scale of 0–15.

The big data hype has its origin in three factors: the appearance of new data types or sources, such as social media; the increasing availability of connected devices, from mobile phones to machine sensors; and the evolution of ways to analyze large data sets in short periods of time. The sense of possibility led to a proliferation of use cases. We cannot say how many of these untested use cases will survive. Ultimately, the question is not what can be done, but what actually delivers value to the end user.

Gartner predicts that 60 percent of big data initiatives will fail in 2017,1 and Wikibon, an open-source research firm, maintains that the average ROI for big data projects is currently only about 55 cents on the dollar spent instead of the expected $3 to $4.2 The latter assessment wasn't made by CFOs, but came directly from practitioners, who saw a “lack of compelling need” for big data in those use cases as a reason for the low returns. However, our experience is that CFOs are increasingly asking about the viability of such analytics.

For large companies, the investment in big data infrastructure and expertise can easily run into the tens of millions of dollars. It would seem obvious that prior to any such investment, the company would want to fully investigate the need, and yet in the 2012 BRITE/NYAMA “Marketing Measurement in Transition” study, 57 percent of companies self-reported that their marketing budgets were not based on ROI analysis.3

Measuring the ROI of analytics use cases is unfortunately not as easy as it sounds. This is especially true where companies have invested in infrastructure such as central data warehouses, software licenses, and data scientist teams. Properly calculating the desired impact at the use case level requires the corresponding governance and control, which is rare at this stage. In a series of initial interviews with companies that went on to become Evalueserve clients, seven areas were found to be lacking—in some cases, almost completely:

  1. Governance structure for the data and use case ownership
  2. Accountability for individual use cases, portfolio management, and associated economics
  3. Clear definition of analytics use cases
  4. Objectives and intended end user benefits for each use case
  5. Tracking the actual results against the targets
  6. Knowledge management allowing the efficient reuse of prior work
  7. Audit trails for the people, timing, actions, and results regarding the code, data, and findings

That said, examples of excellent and highly focused big data use case management do exist. The use case Cross-Sell Analytics: Opportunity Dashboard shows solid accountability. The campaign management function of the bank continually measures the ROI of campaigns end to end, and has built a focused factory for a portfolio of such analytics.

An example of a much weaker big data use case was recently proposed to me by a US start-up engaged in human resources (HR) analytics. The example illustrates some of the fundamental issues with the current hype. An ex-consultant and an ex-national security agent suggested using a derivative of software developed for the surveillance field for recruiting analytics. Based on the previous five to 10 years of job applications—the curriculum vitae (CV) or resume and cover letter—and the performance data of the corresponding employees, a black-box algorithm would build a performance prediction model for new job applicants. The software would deliver hire/no hire suggestions after receiving the data of the new applications.

We rejected the proposal for two reasons: the obvious issue of data privacy and the expected ROI. Having done thousands of interviews, I have a very simple view of resumes. They deliver basic information that's been heavily fine-tuned by more or less competent coaching, and they essentially hide the candidate's true personality. I would argue that the predictive value of CVs has decreased over the past 20 years. Cultural bias in CV massaging is another issue. Human contact—preferably eye contact—is still the only way to cut through these walls of disguise.

The black-box algorithm would therefore have a very severe information shortage, making it not just inefficient, but actually in danger of producing a negative ROI in the form of many wrong decisions. When challenged on this, the start-up's salesperson stated that a “human filter” would have to be applied to find the false positives. Since a black-box algorithm is involved, there is no way of knowing how the software's conclusion was reached, so the analysis would need to be redone 100 percent, reducing the ROI still further.

It was also interesting to see that this use case was being sold as big data. It's a classic example of riding the wave of popularity of a term. Even under the most aggressive scenarios, our human resources performance data is not more than 300 to 400 megabytes, which hardly constitutes big data. Always be wary of excessive marketing language and the corresponding promises!

These are just two isolated use cases, which is certainly not enough to convince anyone trained in statistics, including myself. Therefore, it is necessary to look at how relevant big data analytics is in the overall demographics of analytics. To the best of my knowledge, this is not something that has ever been attempted in a study.

At first, it's necessary to count the number of analytics use cases and put them into various buckets to create a demographic map of analytics (Figure I.2). One cautionary note: counting analytics use cases is tricky due to the variability of possible definitions, so there is a margin of error to the map, although I believe that the order of magnitude is not too far off.

A pie chart representation for demographics of use cases. Pie chart(left) depicting data type for primary use cases, where dark and gray regions are representing small data and big data, respectively. Pie chart (right) depicting customer type for primary use cases, where dark and gray regions are representing B2C and B2B, respectively.

Figure I.2 Demographics of Use Cases

A graphical representation where mind intensity (%) is plotted on the y-axis on a scale of 0–0.25 and machine intensity (%) on the x-axis on a scale of 0–100. The straight line representing the variation of mind intensity with machine intensity at 0, 1, 2, and 3 years.
The upper part in the figure depicting the components of machine. These components: analysis (4), productivity (4), workflow (4), dissemination (5), and knowledge management (5) are represented by vertical bars. The lower part in the figure depicting the components of mind. These components: project management (3), business acumen (4), analysis (5), insight (4), and innovation (5) are represented by vertical bars. Figure depicting the production time of 345 hours over 3 months including development (200 hours), design (100 hours), and testing (45 hours).

This map illustrates my first key point: big data is a relatively small part of the analytics world. Let's take a look at the main results of this assessment of the number of use cases.

  1. Globally, there are a staggering estimated one billion implementations of primary use cases, of which about 85 percent are in B2B and about 15 percent in B2C companies. A primary use case is defined as a generic business issue that needs to be analyzed by a business function (e.g., marketing, R&D) of a company in a given industry and geography. An example could be the monthly analysis of the sales force performance for a specific oncology brand in the pharmaceutical industry in Germany. Similar analyses are performed in pretty much every pharmaceutical company selling oncology drugs in Germany.
  2. Around 30 percent of companies require high analytics intensity and account for about 90 percent of the primary analytics use cases. International companies with multiple country organizations and global functions and domestic companies with higher complexity are the main players here.
  3. The numbers increase to a staggering 50 to 60 billion use cases globally when looking at secondary implementations, which are defined as micro-variations of primary use cases throughout the business year. For example, slightly different materials or sensor packages in different packaging machines might require variant analyses, but the underlying use case of “preventive maintenance for packaging machines” would still remain the same. While not a precise science, this primary versus secondary distinction will be very relevant for counting the number of analytics use cases in the domain of Internet of Things and Industry 4.0. A simple change in sensor configurations might lead to large numbers of completely new secondary use cases. This in turn would cause a lot of additional analytics work, especially if not properly managed for reuse.
  4. Only an estimated 5 to 6 percent of all primary use cases really require big data and the corresponding methodologies and technologies. This finding is completely contrary to the image of big data in the media and public perception. While the number of big data use cases is growing, it can be argued that the same holds true for small data use cases.

The conclusion is that data analytics is mainly a logistical challenge rather than just an analytical one. Managing the growing portfolios of use cases in sustainable and profitable ways is the true challenge and will remain so. In meetings, many executives tell us that they are not leveraging the small data sets their companies already have. We've seen that 94 percent of use cases are really about small data. But do they provide lower ROI because they are based on small data sets? The answer is no—and again, is totally contrary to the image portrayed in the media and the sales pitches of big data vendors.

Let me make a bold statement that is inevitably greeted by some chuckles during client meetings: “Small data is beautiful, too.” In fact, I would argue that the average ROI of a small data use case is much higher due to the significantly lower investment. To illustrate my point, I'd like to present Subscription Management: “The 800 Bits Use Case,” which I absolutely love as it is such an extreme illustration of the point I'm making.

Using just 800 bits of HR information, an investment bank saved USD 1 million every year, generating an ROI of several thousand percent. How? Banking analysts use a lot of expensive data from databases paid through individual seat licenses. After bonus time in January, the musical chairs game starts and many analyst teams join competitor institutions, at which point the seat license should be canceled. In this case, this process step simply did not happen, as nobody thought about sending the corresponding instructions to the database companies in time. Therefore, the bank kept unnecessarily paying about USD 1 million annually. Why 800 bits? Clearly, whether someone is employed (“1”) or not (“0”) is a binary piece of information called a “bit.” With 800 analysts, the bank had 800 bits of HR information. The analytics rule was almost embarrassingly simple: “If no longer employed, send email to terminate the seat license.” All that needed to happen was a simple search for changes in employment status in the employment information from HR.

The amazing thing about this use case is it just required some solid thinking, linking a bit of employment information with the database licenses. Granted, not every use case is as profitable as this one, but years of experience suggest that good thinking combined with the right data can create a lot of value in many situations.

A graphical representation where mind intensity (%) is plotted on the y-axis on a scale of 0–100 and machine intensity (%) on the x-axis on a scale of 0–100. The straight line representing the variation of mind intensity with machine intensity.
The upper part in the figure depicting the components of machine. These components: analysis (3), productivity (3), workflow (4), dissemination (3), and knowledge management (3) are represented by vertical bars. The lower part in the figure depicting the components of mind. These components: project management (3), business acumen (3), analysis (3), insight (5), and innovation (4) are represented by vertical bars. A graphical representation where FTEs is plotted on the y-axis on a scale of 0–1 and months on the x-axis on a scale of 0–3. Dark-shaded and gray-shaded regions are representing development and testing, respectively.

This use case illustrates another important factor: the silo trap. Interesting use cases often remain unused because data sets are buried in two or more organizational silos, and nobody thinks about joining the dots. We will look at this effect again later.

Summing up the first fallacy: not everything needs to be big data. In fact, far more use cases are about small data, and the focus should be on managing portfolios of profitable analytics use cases regardless of what type of data they are based on.

Notes