Cover
Title Page
About the Authors
List of Figures
List of Tables
Preface
1 The Opportunity
1. 1.1 Introduction
2. 1.2 The Rise of Data
3. 1.3 Realising Data as an Opportunity
4. 1.4 Our Definition of Monetising Data
5. 1.5 Guidance on the Rest of the Book
2 About Data and Data Science
1. 2.1 Introduction
2. 2.2 Internal and External Sources of Data
3. 2.3 Scales of Measurement and Types of Data
4. 2.4 Data Dimensions
5. 2.5 Quality of Data
6. 2.6 Importance of Information
7. 2.7 Experiments Yielding Data
8. 2.8 A Data‐readiness Scale for Companies
9. 2.9 Data Science
10. 2.10 Data Improvement Cycle
3 Big Data Handling, Storage and Solutions
1. 3.1 Introduction
2. 3.2 Big Data, Smart Data…
3. 3.3 Big Data Solutions
4. 3.4 Operational Systems supporting Business Processes
5. 3.5 Analysis‐based Information Systems
6. 3.6 Structured Data – Data Warehouses
7. 3.7 Poly‐structured (Unstructured) Data – NoSQL Technologies
8. 3.8 Data Structures and Latency
9. 3.9 Data Marts
4 Data Mining as a Key Technique for Monetisation
1. 4.1 Introduction
2. 4.2 Population and Sample
3. 4.3 Supervised and Unsupervised Methods
4. 4.4 Knowledge‐discovery Techniques
5. 4.5 Theory of Modelling
6. 4.6 The Data Mining Process
5 Background and Supporting Statistical Techniques
1. 5.1 Introduction
2. 5.2 Variables
3. 5.3 Key Performance Indicators
4. 5.4 Taming the Data
5. 5.5 Data Visualisation and Exploration of Data
6. 5.6 Basic Statistics
7. 5.7 Feature Selection and Reduction of Variables
8. 5.8 Sampling
9. 5.9 Statistical Methods for Proving Model Quality and Generalisability and Tuning Models
6 Data Analytics Methods for Monetisation
1. 6.1 Introduction
2. 6.2 Predictive Modelling Techniques
3. 6.3 Pattern Detection Methods
4. 6.4 Methods in practice
7 Monetisation of Data and Business Issues: Overview
1. 7.1 Introduction
2. 7.2 General Strategic Opportunities
3. 7.3 Data as a Donation
4. 7.4 Data as a Resource
5. 7.5 Data Leading to New Business Opportunities
6. 7.6 Information Brokering using Data
7. 7.7 Connectivity as a Strategic Opportunity
8. 7.8 Problem‐solving Methodology
8 How to Create Profit Out of Data
1. 8.1 Introduction
2. 8.2 Business Models for Monetising Data
3. 8.3 Data Product Design
4. 8.4 Value of Data
5. 8.5 Charging Mechanisms
6. 8.6 Connectivity as an Opportunity for Streamlining a Business
9 Some Practicalities of Monetising Data
1. 9.1 Introduction
2. 9.2 Practicalities
3. 9.3 Special focus on SMEs
4. 9.4 Special Focus on B2B Lead Generation
5. 9.5 Legal and Ethical Issues
6. 9.6 Payments
7. 9.7 Innovation
10 Case Studies
1. 10.1 Job Scheduling in Utilities
2. 10.2 Shipping
3. 10.3 Online Sales or Mail Order
4. 10.4 Intelligent Profiling with Loyalty Card Schemes
5. 10.5 Social Media: a Mechanism to Collect and Use Contributor Data
6. 10.6 Making a Business out of Boring Statistics
7. 10.7 Social Media and Web Intelligence Services
8. 10.8 Service Provider
9. 10.9 Data Source
10. 10.10 Industry 4.0: Metamodelling using Simulated Data
11. 10.11 Industry 4.0: Modelling Pricing Data in Manufacturing
12. 10.12 Monetising Data in an SME
13. 10.13 Making Sense of Public Finance and Other Data
14. 10.14 Benchmarking who is the Best in the Market
15. 10.15 Change of Shopping Habits Part I
16. 10.16 Change of Shopping Habits Part II
17. 10.17 Change of Shopping Habits Part III
18. 10.18 Service Providers, Households and Facility Management
19. 10.19 Insurance, Healthcare and Risk Management
20. 10.20 Mobility and Connected Cars
21. 10.21 Production and Automation in Industry 4.0
Bibliography
Glossary
Index
End User License Agreement

List of Tables

Chapter 02
1. Table 2.1 Typical internal and external data in information systems.
2. Table 2.2 Extract of sales data.
3. Table 2.3 Company sales data analytics.
4. Table 2.4 Internal sales data enriched with external data.
5. Table 2.5 Scales of measurement examples.
6. Table 2.6 Checklist for data readiness.
Chapter 04
1. Table 4.1 Confusion matrix for comparing models.
Chapter 05
1. Table 5.1 Partially tamed data.
2. Table 5.2 Outcomes of a hypothesis test.
3. Table 5.3 Typical significance borders.
4. Table 5.4 Examples of statistical tests.
5. Table 5.5 Example of a contingency table.
6. Table 5.6 Target proportions.
7. Table 5.7 Confusion matrix.
8. Table 5.8 Gains chart.
9. Table 5.9 Non‐cumulative lift and gains table.
Chapter 06
1. Table 6.1 Example of a contingency table.
2. Table 6.2 Analysis table for goodness of fit.
Chapter 08
1. Table 8.1 Business models for types of exchange.
2. Table 8.2 Business models for B2C selling.
3. Table 8.3 Business models for service providers.
Chapter 09
1. Table 9.1 Business model canvas of the comparisons between data brokers and insight innovators.
Chapter 10
1. Table 10.1 Summary of case studies.
2. Table 10.2 Risk scores in a simple case.
3. Table 10.3 Distribution of risk scores in different seasons.
4. Table 10.4 Allowable stress for soft impact.
5. Table 10.5 Parameters used to describe a four‐sided glass panel.
6. Table 10.6 Data dimensions and stakeholders.

List of Illustrations

Chapter 01
1. Figure 1.1 Where does big data come from?.
2. Figure 1.2 Big data empowers business.
3. Figure 1.3 Roadmap to success.
4. Figure 1.4 Wish list for generating money out of data.
5. Figure 1.5 Monetising data.
Chapter 02
1. Figure 2.1 Deming’s ‘Plan, Do, Check, Act’ quality improvement cycle.
2. Figure 2.2 Six Sigma quality improvement cycle.
3. Figure 2.3 Example of data maturity model.
4. Figure 2.4 Data improvement cycle.
Chapter 03
1. Figure 3.1 Big data definition.
2. Figure 3.2 Internet of things timeline.
3. Figure 3.3 Example data structure.
4. Figure 3.4 NoSQL management systems.
5. Figure 3.5 Big data structure and latency.
Chapter 04
1. Figure 4.1 Supervised learning.
2. Figure 4.2 Unsupervised learning.
3. Figure 4.3 The CRISP‐DM process.
4. Figure 4.4 The SEMMA process.
5. Figure 4.5 General representation of the data mining process.
6. Figure 4.6 Time periods for data mining process.
7. Figure 4.7 Stratified sampling.
8. Figure 4.8 Lift chart for model comparison.
9. Figure 4.9 Lift chart at small scale.
10. Figure 4.10 An example of model control.
Chapter 05
1. Figure 5.1 Raw data from a customer transaction.
2. Figure 5.2 Bar chart of relative frequencies.
3. Figure 5.3 Example of cumulative view.
4. Figure 5.4 Example of a Pareto chart.
5. Figure 5.5 Example of a pie chart.
6. Figure 5.6 Scatterplot of company age and auditing behaviour with LOWESS line.
7. Figure 5.7 Scatterplot of design options.
8. Figure 5.8 Ternary diagram showing proportions.
9. Figure 5.9 Radar plot of fitness panel data.
10. Figure 5.10 Example of a word cloud.
11. Figure 5.11 Example of a mind map.
12. Figure 5.12 Location heat map.
13. Figure 5.13 Density map for minivans.
14. Figure 5.14 SPC chart of shipping journeys.
15. Figure 5.15 Decision tree analysis for older workers.
16. Figure 5.16 Gains chart.
17. Figure 5.17 Lift chart.
18. Figure 5.18 ROC curve development during predictive modelling.
Chapter 06
1. Figure 6.1 Example of logistic regression.
2. Figure 6.2 Corrected logistic regression.
3. Figure 6.3 Decision tree.
4. Figure 6.4 Artificial neural network.
5. Figure 6.5 Bayesian network analysis of survey data.
6. Figure 6.6 Bayesian network used to explore what‐if scenarios.
7. Figure 6.7 Plot of non‐linear separation on a hyperplane.
8. Figure 6.8 Dendrogram from hierarchical cluster analysis.
9. Figure 6.9 Parallel plot from K‐means cluster analysis.
10. Figure 6.10 Kohonen network with two‐dimensional arrangement of the output neurons.
11. Figure 6.11 SOM output.
12. Figure 6.12 T‐SNE output.
13. Figure 6.13 Correspondence analysis output: scatterplot of RPC2 vs RPC1, the two principal dimensions showing how the row profiles in a contingency table differ from each other.
14. Figure 6.14 Association rules.
15. Figure 6.15 Association analysis of products.
16. Figure 6.16 Comparison of customer base and population.
17. Figure 6.17 Relationship between energy usage and deprivation: scatterplot of mean AQ vs percentage of households deprived.
18. Figure 6.18 Map showing prices.
Chapter 07
1. Figure 7.1 Strategic opportunities.
2. Figure 7.2 How data can boost top‐ and bottom‐line results.
3. Figure 7.3 Typical data request.
4. Figure 7.4 Observed data and usage.
5. Figure 7.5 Maslow’s hierarchy of needs.
6. Figure 7.6 Data sources to empower consumer business.
7. Figure 7.7 Ready information on market opportunities.
8. Figure 7.8 Word cloud from keyword occurrences.
9. Figure 7.9 Using different data sources for analytics.
10. Figure 7.10 Daily sleep patterns.
11. Figure 7.11 Predictive analytics in insurance.
Chapter 08
1. Figure 8.1 Pathways to monetising data.
2. Figure 8.2 Segmentation features of walk‐in customers.
3. Figure 8.3 Business opportunities.
Chapter 09
1. Figure 9.1 Paths to monetisation.
2. Figure 9.2 Pareto diagram of customer compliments.
3. Figure 9.3 Graphical dashboard.
4. Figure 9.4 Decrypting the DNA of the best existing customers.
5. Figure 9.5 Aspects of digital maturity.
6. Figure 9.6 Closed loop of B2B customer profiling – continuous learning.
7. Figure 9.7 Automated B2B lead generation system.
8. Figure 9.8 New methods, new insights, smart business.
9. Figure 9.9 Misleading scatterplots.
10. Figure 9.10 Scatterplot with multiple features.
11. Figure 9.11 Histogram of suspicious‐quality recordings.
Chapter 10
1. Figure 10.1 The evolution of data analytics
2. Figure 10.2 Cumulative distribution of risk scores.
3. Figure 10.3 Data sources in the shipping industry.
4. Figure 10.4 Optimum speed recommendation.
5. Figure 10.5 Pruned decision tree.
6. Figure 10.6 Detail from decision tree
7. Figure 10.7 Customised communication.
8. Figure 10.8 Individualised communication.
9. Figure 10.9 Complexity of data mining steps.
10. Figure 10.10 Data in the customer journey.
11. Figure 10.11 Intelligent profiles and segments in B2C.
12. Figure 10.12 Personalised journey.
13. Figure 10.13 The reach of social media.
14. Figure 10.14 The power of social media.
15. Figure 10.15 Using peer group behaviour.
16. Figure 10.16 National statistics oil prices.
17. Figure 10.17 Example of reports portal
18. Figure 10.18 Making a business out of boring statistics.
19. Figure 10.19 Right place, right time.
20. Figure 10.20 Social media information summarised.
21. Figure 10.21 Visualisation of user engagement.
22. Figure 10.22 Concept of newsletter tracking.
23. Figure 10.23 Example report on testing different versions.
24. Figure 10.24 Customer profile details.
25. Figure 10.25 Company profile details.
26. Figure 10.26 Example of glass facades in buildings.
27. Figure 10.27 Half normal plot of a screening experiment.
28. Figure 10.28 Predicted vs calculated resistance factor with validation.
29. Figure 10.29 Residual plot of prices.
30. Figure 10.30 Visualisation of groups of products.
31. Figure 10.31 Open data available to enrich company data.
32. Figure 10.32 Diffusion map showing clusters of shares.
33. Figure 10.33 Sampling approach for benchmarking in China.
34. Figure 10.34 Three‐step approach to survey analytics.
35. Figure 10.35 Skateboard offer.
36. Figure 10.36 Customer journey.
37. Figure 10.37 Example of customer segments.
38. Figure 10.38 Virtual changing room.
39. Figure 10.39 Virtual supermarket at bus stop.
40. Figure 10.40 Input from miscellaneous IoT sensors.
41. Figure 10.41 Appealing sleep sensor display.
42. Figure 10.42 Sensors connected by mobile phone.
43. Figure 10.43 The connected car.
44. Figure 10.44 The new connected eco‐system.
45. Figure 10.45 Industry 4.0.
46. Figure 10.46 Industry 4.0 in action.

This edition first published 2018
© 2018 John Wiley & Sons Ltd

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

The right of Andrea Ahlemeyer‐Stubbe and Shirley Coleman to be identified as the authors of this work has been asserted in accordance with law.

Registered Offices
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

Editorial Office
9600 Garsington Road, Oxford, OX4 2DQ, UK

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of Warranty
While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

Library of Congress Cataloging‐in‐Publication data applied for

ISBN: 9781119125136

Cover Design: Wiley
Cover Images: (Business people) © JohnnyGreig/Gettyimages; (Currencies) © Inok/iStockphoto

List of Figures

Figure 1.1	Where does big data come from?
Figure 1.2	Big data empowers business
Figure 1.3	Roadmap to success
Figure 1.4	Wish list for generating money out of data
Figure 1.5	Monetising data
Figure 2.1	Deming’s ‘Plan, Do, Check, Act’ quality improvement cycle
Figure 2.2	Six Sigma quality improvement cycle
Figure 2.3	Example of data maturity model
Figure 2.4	Data improvement cycle
Figure 3.1	Big data definition
Figure 3.2	Internet of things timeline
Figure 3.3	Example data structure
Figure 3.4	NoSQL management systems
Figure 3.5	Big data structure and latency
Figure 4.1	Supervised learning
Figure 4.2	Unsupervised learning
Figure 4.3	The CRISP‐DM process
Figure 4.4	The SEMMA process
Figure 4.5	General representation of the data mining process
Figure 4.6	Time periods for data mining process
Figure 4.7	Stratified sampling
Figure 4.8	Lift chart for model comparison
Figure 4.9	Lift chart at small scale
Figure 4.10	An example of model control
Figure 5.1	Raw data from a customer transaction
Figure 5.2	Bar chart of relative frequencies
Figure 5.3	Example of cumulative view
Figure 5.4	Example of a Pareto chart
Figure 5.5	Example of a pie chart
Figure 5.6	Scatterplot of company age and auditing behaviour with LOWESS line
Figure 5.7	Scatterplot of design options
Figure 5.8	Ternary diagram showing proportions
Figure 5.9	Radar plot of fitness panel data
Figure 5.10	Example of a word cloud
Figure 5.11	Example of a mind map
Figure 5.12	Location heat map
Figure 5.13	Density map for minivans
Figure 5.14	SPC chart of shipping journeys
Figure 5.15	Decision tree analysis for older workers
Figure 5.16	Gains chart
Figure 5.17	Lift chart
Figure 5.18	ROC curve development during predictive modelling
Figure 6.1	Example of logistic regression
Figure 6.2	Corrected logistic regression
Figure 6.3	Decision tree
Figure 6.4	Artificial neural network
Figure 6.5	Bayesian network analysis of survey data
Figure 6.6	Bayesian network used to explore what‐if scenarios
Figure 6.7	Plot of non‐linear separation on a hyperplane
Figure 6.8	Dendrogram from hierarchical cluster analysis
Figure 6.9	Parallel plot from K‐means cluster analysis
Figure 6.10	Kohonen network with two‐dimensional arrangement of the output neurons
Figure 6.11	SOM output
Figure 6.12	T‐SNE output
Figure 6.13	Correspondence analysis output
Figure 6.14	Association rules
Figure 6.15	Association analysis of products
Figure 6.16	Comparison of customer base and population
Figure 6.17	Relationship between energy usage and deprivation
Figure 6.18	Map showing prices
Figure 7.1	Strategic opportunities
Figure 7.2	How data can boost top‐ and bottom‐line results
Figure 7.3	Typical data request
Figure 7.4	Observed data and usage
Figure 7.5	Maslow’s hierarchy of needs
Figure 7.6	Data sources to empower consumer business
Figure 7.7	Ready information on market opportunities
Figure 7.8	Word cloud from keyword occurrences
Figure 7.9	Using different data sources for analytics
Figure 7.10	Daily sleep patterns
Figure 7.11	Predictive analytics in insurance
Figure 8.1	Pathways to monetising data
Figure 8.2	Segmentation features of walk‐in customers
Figure 8.3	Business opportunities
Figure 9.1	Paths to monetisation
Figure 9.2	Pareto diagram of customer compliments
Figure 9.3	Graphical dashboard
Figure 9.4	Decrypting the DNA of the best existing customers
Figure 9.5	Aspects of digital maturity
Figure 9.6	Closed loop of B2B customer profiling – continuous learning
Figure 9.7	Automated B2B lead generation system
Figure 9.8	New methods, new insights, smart business
Figure 9.9	Misleading scatterplots
Figure 9.10	Scatterplot with multiple features
Figure 9.11	Histogram of suspicious‐quality recordings
Figure 10.1	The evolution of data analytics
Figure 10.2	Cumulative distribution of risk scores
Figure 10.3	Data sources in the shipping industry
Figure 10.4	Optimum speed recommendation
Figure 10.5	Pruned decision tree
Figure 10.6	Detail from decision tree
Figure 10.7	Customised communication
Figure 10.8	Individualised communication
Figure 10.9	Complexity of data mining steps
Figure 10.10	Data in the customer journey
Figure 10.11	Intelligent profiles and segments in B2C
Figure 10.12	Personalised journey
Figure 10.13	The reach of social media
Figure 10.14	The power of social media
Figure 10.15	Using peer group behaviour
Figure 10.16	National statistics oil prices
Figure 10.17	Example of reports portal
Figure 10.18	Making a business out of boring statistics
Figure 10.19	Right place, right time
Figure 10.20	Social media information summarised
Figure 10.21	Visualisation of user engagement
Figure 10.22	Concept of newsletter tracking
Figure 10.23	Example report on testing different versions
Figure 10.24	Customer profile details
Figure 10.25	Company profile details
Figure 10.26	Example of glass facades in buildings
Figure 10.27	Half normal plot of a screening experiment
Figure 10.28	Predicted vs calculated resistance factor with validation
Figure 10.29	Residual plot of prices
Figure 10.30	Visualisation of groups of products
Figure 10.31	Open data available to enrich company data
Figure 10.32	Diffusion map showing clusters of shares
Figure 10.33	Sampling approach for benchmarking in China
Figure 10.34	Three‐step approach to survey analytics
Figure 10.35	Skateboard offer
Figure 10.36	Customer journey
Figure 10.37	Example of customer segments
Figure 10.38	Virtual changing room
Figure 10.39	Virtual supermarket at bus stop
Figure 10.40	Input from miscellaneous IoT sensors
Figure 10.41	Appealing sleep sensor display
Figure 10.42	Sensors connected by mobile phone
Figure 10.43	The connected car
Figure 10.44	The new connected eco‐system
Figure 10.45	Industry 4.0
Figure 10.46	Industry 4.0 in action

Preface

When we finished writing our Practical Guide to Data Mining for Business and Industry, we realised that there were still things to say. The growth of interest in data has been enormous and there are now even more opportunities than during the earlier years when there was a steady awakening to the importance of data for business and industry.

Data analytics appears on billboards in mainstream locations such as airports, and even mathematics is being coupled with adverts for cars in a positive way. Everyone is aware that they have data and has seen the graphs and predictions that analysis produces.

The book describes how any business can be uplifted by monetising data. We show how data is generated by sensors, smart homes, apps, website visits, social network usage, digital communication, purchase behaviour, credit card usage, connected car devices and self‐quantification. Enriched by integrating with official statistics, analysis of these datasets brings real business advantage.

The book invites the reader to think about their data resources and be creative in how they use them. The book is not organised as a technical text but includes many examples of innovative applications of statistical thinking and analytical approaches. It does not propose original statistical or machine learning methods but focuses on applications of data‐driven approaches. It is general in scope and can thus serve as an introductory text. It has a management focus and the reader can judge for themselves where they can use the ideas. The structure of the book aims to be logical and cover the whole loop of using data for business decisions. The idea of exploring and giving advice on how to convert data into money is really appealing.

Even after several years of excitement about big data, there are few practical case studies available. For this reason, we include 21 in the final chapter to give realistic suggestions for what to do. The other chapters of the book give necessary background and motivational content.

It is timely to publish this book now, as big data and data analytics have captured the imagination of business and public alike. Data can be seen as the most powerful resource of the future; we believe it has more influence on the wealth of companies and people than any other resource. The authors have long been proponents of data analysis for business advantage and so it is with delight that we can collate our experience and rationale and share it with other people.

The ideas in this book have arisen from many hours of fascinating consulting work. We have felt honoured to be allowed to immerse ourselves in the company culture and explore their data, and been able to present solutions that in many cases have brought great financial benefits.

We are grateful to all the business people we have worked with. Writing takes considerable time and our families and friends have been very accommodating. We thank them all very much.

1
The Opportunity

1.1 Introduction

Data awareness has swept across economic, political, occupational, social and personal life. Making sense of the fabulous opportunities afforded by such an abundance of data is the challenge of every business and each individual. The journey starts with understanding what data is, where it comes from, what insight it can give and how to extract it. These activities are sometimes referred to as descriptive analytics and predictive analytics. In descriptive analytics data is explored by looking at summary statistics and graphics, and the results are highly accessible and informative. Predictive analytics takes the analysis further and involves statistical approaches that utilise the full richness of the data and lead to predictive models to aid decision making.

This introductory chapter discusses the rise in data, changes in attitude to data and the advantages of getting to grips with accessing, analysing and utilising data. Definitions of concepts such as open data and big data are followed by guidance for reading the rest of the book.

1.2 The Rise of Data

There is much more data available and accessible than ever before.

Increasingly data is discussed in the popular press and, rather than shying away from figures, statistics and mathematics, advertisers are using these words more and more often. People are becoming more comfortable with data. This is clear from the increase in the use of self‐measurement and mapping facilities on personal devices such as mobile phones and tablets; people have a thirst for measuring everything in their daily life and like to try and control things to keep their life in good shape. Many people choose vehicles that are fitted with advanced digital measurement devices that manage engine performance and record fuel usage and location. All this is in addition to the increased automation of production lines and machinery, which have resulted in copious measurements being a familiar concept. A major contributor to the rise in importance of data is the impact of cheap data storage. For example, an external hard drive with terabytes of memory can be bought for the price of a visit to the hairdresser.

The common phrase to describe this changed world is ‘big data’ (Figure 1.1). A book on monetising data is inevitably about big data. We will interpret the term big data as data that is of a volume, variability and velocity that means common methods of appraisal are not appropriate. We need analytical methods to see the valuable patterns in it.

Diagram illustrating the worldwide volume of data with corresponding icons for Smartphones/mobile, Analytics, Production, Social networks, Internet browsing, RFID, Credit cards, Shopping, and Cars etc. — **Figure 1.1** Where does big data come from?.

Since the early 2000s there has been a drive to make data more available, giving rise to the open data movement. This promotes sharing of data gathered with the benefit of public funding and includes most official statistics, academic research output and some market, product and service evaluation data. The opening up of data has led to a steep increase in requests for access to even more data; the result is a burgeoning interest in action learning and enthusiasm to understand the potential waiting to be uncovered from the data. The profession of data scientist has evolved and now encapsulates the skills and knowledge to handle and generate insights from this information.

Figure 1.2 shows how big data combined with analytics might empower different areas of any business. The aim of this book is to encourage people to use their big data to work out exciting business opportunities, make major changes and optimise the way things are run.

Diagram with a circle labeled big data and analytics, gear wheel labeled analytics, and gear wheel in a light bulb. Top: Chevron labeled potential, action insights, put into action, etc. Bottom: 6 Boxes with labels. — **Figure 1.2** Big data empowers business.

1.3 Realising Data as an Opportunity

One of the key motivations for this book on monetising data is the sheer amount of under‐utilised data around. Hardly less important is the under‐achievement in terms of business benefit derived among those who do use their data. This suggests a two‐dimensional representation of the state of organisations, with one axis representing the usage of business data and the other axis representing the business benefit derived from it. Needless to say, the star performers are at the top right‐hand side of the resulting diagram in Figure 1.3. Being in the top and right‐hand corner is better than being at the top or at the right‐hand side of the axes because the two factors reinforce each other in a synergistic manner, giving greater benefits than either alone.

Graph of business benefit of data vs. usage of data displaying icons labeled “In need of more business focus”, “All to play for!”, and “In need of more analytics” with arrows pointing to “Top performers”. — **Figure 1.3** Roadmap to success.

The marketplace is highly heterogeneous, with companies and institutions (all referred to as ‘organisations’ henceforth) differentiated in many ways, including:

sector
size of turnover
size in numbers of employees
maturity
research focus
product or service development.

The baseline against which organisations can benchmark themselves in Figure 1.3 is different for different types of organisation.

Familiar players using big data include retail, finance, automotive manufacturers, health providers and process industries. In addition, the following are some of the less familiar organisations likely to be in possession of big data:

Sports societies: these may have larger turnover than expected and hold vast data banks of members’ details and their sporting activities.
Museums and galleries: these may have loyalty cards and multiple entry passes that yield customer details, frequency of visits, distance travelled, inclination and time spent at the venue.
Theatres and entertainment venues: these have names, addresses and frequency of attendance of attendees, and can study their catchment area and the popularity of different acts.
Libraries: these have names and addresses and members’ interests and usage.
Small retailers: these have records of itemised sales by day of week, time of day and season plus amount spent.
Craft and niche experts: who are first aware of trends and may have a global outlook.

All these organisations can take advantage of their data but they start from different points with different resources and capabilities; with good ideas they may have the opportunity to become winners in their own areas. Experience suggests that organisations have a secret wish list for generating money out of their data. Figure 1.4 shows the ranking we observed from our clients. However, this is just a snapshot and does not include business enrichment and transformation, which are also possible.

Horizontal bar graph for generating money out of data illustrating cost savings in business processes (61%), cost savings in IT (57%), profits from the business model (35%), and competitive advantage (35%) etc. — **Figure 1.4** Wish list for generating money out of data.

Figure 1.5 shows a very generalised process for monetising data. Data comes into the process and is first used for business monitoring, leading to business insights; these might generate business optimisation and might lead to monetisation and potential business transformation.

Image described by surrounding text. — **Figure 1.5** Monetising data.

Despite differences in scale, the matrix in Figure 1.3 can help any organisation to map their current situation and plan their next steps to uplift their business.

1.4 Our Definition of Monetising Data

Data is the fundamental commodity, consisting of a representation of facts. However, when the data are summarised and illustrated they can lead to meaningful information, and assessing the meaningful information in context can lead to knowledge and wisdom.

Monetising data is more than just selling data and information. It includes everything where data is used in exchange for business advantage and supports business success. Large companies are often data rich and some have realised the advantage this gives them. Others consider themselves data rich but information poor because they have lots of data but it is not in a form that they can easily interpret or use to gain business insights. Statistical enthusiasm is a rare commodity but those businesses that pay attention to their data can find the answers to many of their policy and productivity questions. For example, scrutiny of data on sales easily yields information about seasonal trends: sales per customer might show shortfalls in maximising selling opportunities; total income might show overall success in attracting buyers, and so on.

Case studies and real data from our consulting practices are used throughout the book to illustrate the ideas, methods and techniques that are involved. As will be seen, most data can be monetised to bring benefit to the organisation. However, a lot of effort has to be expended to get the data into a suitable format for analysis. Data readiness can be assessed using tools that we will discuss. As analytics progresses, guidelines for data improvement become meaningful and we introduce the concept of the data improvement cycle to help organisations in continuous improvement and moving forward with their data analytics.

This book is aimed at managers in progressive organisations: managers who are keen to develop their own careers and who have the opportunity to suggest new ideas and innovative approaches for their organisation and influence how they are taken forward. The material requires background knowledge of dealing with numbers and spreadsheets and basic business principles. More specialised techniques, such as the use of decision tree analysis and predictive models, are fully explained. The main issue is the strength of desire to join the data revolution and hopefully after reading this book you will be an excited convert.

1.5 Guidance on the Rest of the Book

The rest of the book is planned as follows. Chapters 2 and 3 address data collection and preparation issues, including the use of mapping and meteorological data as well as official statistics. Chapter 4 looks at general issues around data mining: as a concept and a mechanism for gathering insights from data. Chapters 5 and 6 address technical methods; Chapter 5 looks at descriptive analytics, starting with statistical methods for summarising data and graphical presentations, and Chapter 6 moves on to statistical testing, modelling, segmentation, network analysis and predictive analytics.

Chapters 7 and 8 introduce the different strategies, motivations, modes and concepts for monetising data and examine barriers and enablers for organisations seeking to realise the full potential of their data, their valuable asset. Monetisation can be viewed strategically and operationally. Strategically we can look at new business directions, step changes in thinking, disruptive innovation and new income streams. Operationally we can consider optimising current business models, and making better use of customer targeting and segmentation. In Chapter 7 we focus on strategic issues, whilst operational improvements of the existing business will be explored in Chapter 8. In Chapter 9 we will consider the practicalities of implementation, such as issues of ethics, privacy and security; loss of cultural and technical learning due to staff turnover and the other dampers that have to be overcome before we can achieve strategic steps forward and improvement of the current situation.

The mutual importance of theory and practice has long been recognised. As Chebyshev, a founding father of much statistical theory, said back in the 19th century, ‘Progress in any discipline is most successful when theory and practice develop hand in hand’. Not only does practice benefit from theory but theory benefits from practice. So in Chapter 10 we describe a set of case studies in which monetisation has brought big gains and uplifted the business. Thus we will aim to end the book on a high note and provide inspiration to move forward.

If you locate yourself within the grid in Figure 1.3 you can see which parts of the book are most relevant for you. Those readers at the bottom left are probably at the beginning of their exploration of monetisation and could well jump to the case studies in Chapter 10 for motivation and then return to Chapter 2. Those at the bottom right have already gained substantial business advantages but could benefit from learning new statistical and data‐mining techniques to make deeper use of their data, as described in the more technical Chapters 3–6. Those at the top left already have experience of analysing data but need to realise a better business advantage and could go straight to Chapters 7–9. Those at the top right can read the whole book for revision purposes and further insights!

Note that we avoid naming specific companies. Instead we refer to them in a generic way and the reader is welcome to find example companies by searching online.

Table 2.1	Typical internal and external data in information systems
Table 2.2	Extract of sales data
Table 2.3	Company sales data analytics
Table 2.4	Internal sales data enriched with external data
Table 2.5	Scales of measurement examples
Table 2.6	Checklist for data readiness
Table 4.1	Confusion matrix for comparing models
Table 5.1	Partially tamed data
Table 5.2	Outcomes of a hypothesis test
Table 5.3	Typical significance borders
Table 5.4	Examples of statistical tests
Table 5.5	Example of a contingency table
Table 5.6	Target proportions
Table 5.7	Confusion matrix
Table 5.8	Gains chart
Table 5.9	Non‐cumulative lift and gains table
Table 6.1	Example of a contingency table
Table 6.2	Analysis table for goodness of fit
Table 8.1	Business models for types of exchange
Table 8.2	Business models for B2C selling
Table 8.3	Business models for service providers
Table 9.1	Business model canvas of the comparisons between data brokers and insight innovators
Table 10.1	Summary of case studies
Table 10.2	Risk scores in a simple case
Table 10.3	Distribution of risk scores in different seasons
Table 10.4	Allowable stress for soft impact
Table 10.5	Parameters used to describe a four‐sided glass panel
Table 10.6	Data dimensions and stakeholders