Cover
Preface
Acknowledgments
PART 1: Introduction and Theory
1. CHAPTER 1: Alternative Data: The Lay of the Land
  1. 1.1 INTRODUCTION
  2. 1.2 WHAT IS “ALTERNATIVE DATA”?
  3. 1.3 SEGMENTATION OF ALTERNATIVE DATA
  4. 1.4 THE MANY VS OF BIG DATA
  5. 1.5 WHY ALTERNATIVE DATA?
  6. 1.6 WHO IS USING ALTERNATIVE DATA?
  7. 1.7 CAPACITY OF A STRATEGY AND ALTERNATIVE DATA
  8. 1.8 ALTERNATIVE DATA DIMENSIONS
  9. 1.9 WHO ARE THE ALTERNATIVE DATA VENDORS?
  10. 1.10 USAGE OF ALTERNATIVE DATASETS ON THE BUY SIDE
  11. 1.11 CONCLUSION
  12. NOTES
2. CHAPTER 2: The Value of Alternative Data
  1. 2.1 INTRODUCTION
  2. 2.2 THE DECAY OF INVESTMENT VALUE
  3. 2.3 DATA MARKETS
  4. 2.4 THE MONETARY VALUE OF DATA (PART I)
  5. 2.5 EVALUATING (ALTERNATIVE) DATA STRATEGIES WITH AND WITHOUT BACKTESTING
  6. 2.6 THE MONETARY VALUE OF DATA (PART II)
  7. 2.7 THE ADVANTAGES OF MATURING ALTERNATIVE DATASETS
  8. 2.8 SUMMARY
  9. NOTES
3. CHAPTER 3: Alternative Data Risks and Challenges
  1. 3.1 LEGAL ASPECTS OF DATA
  2. 3.2 RISKS OF USING ALTERNATIVE DATA
  3. 3.3 CHALLENGES OF USING ALTERNATIVE DATA
  4. 3.4 AGGREGATING THE DATA
  5. 3.5 SUMMARY
  6. NOTES
4. CHAPTER 4: Machine Learning Techniques
  1. 4.1. INTRODUCTION
  2. 4.2. MACHINE LEARNING: DEFINITIONS AND TECHNIQUES
  3. 4.3. WHICH TECHNIQUE TO CHOOSE?
  4. 4.4. ASSUMPTIONS AND LIMITATIONS OF THE MACHINE LEARNING TECHNIQUES
  5. 4.5. STRUCTURING IMAGES
  6. 4.6. NATURAL LANGUAGE PROCESSING (NLP)
  7. 4.7. SUMMARY
  8. NOTES
5. CHAPTER 5: The Processes behind the Use of Alternative Data
  1. 5.1. INTRODUCTION
  2. 5.2. STEPS IN THE ALTERNATIVE DATA JOURNEY
  3. 5.3. STRUCTURING TEAMS TO USE ALTERNATIVE DATA
  4. 5.4. DATA VENDORS
  5. 5.5. SUMMARY
  6. NOTES
6. CHAPTER 6: Factor Investing
  1. 6.1. INTRODUCTION
  2. 6.2. FACTOR MODELS
  3. 6.3. THE DIFFERENCE BETWEEN CROSS-SECTIONAL AND TIME SERIES TRADING APPROACHES
  4. 6.4. WHY FACTOR INVESTING?
  5. 6.5. SMART BETA INDICES USING ALTERNATIVE DATA INPUTS
  6. 6.6. ESG FACTORS
  7. 6.7. DIRECT AND INDIRECT PREDICTION
  8. 6.8. SUMMARY
  9. NOTES
PART 2: Practical Applications
1. CHAPTER 7: Missing Data: Background
  1. 7.1. INTRODUCTION
  2. 7.2. MISSING DATA CLASSIFICATION
  3. 7.3. LITERATURE OVERVIEW OF MISSING DATA TREATMENTS
  4. 7.4. SUMMARY
  5. NOTES
2. CHAPTER 8: Missing Data: Case Studies
  1. 8.1. INTRODUCTION
  2. 8.2. CASE STUDY: IMPUTING MISSING VALUES IN MULTIVARIATE CREDIT DEFAULT SWAP TIME SERIES
  3. 8.3. CASE STUDY: SATELLITE IMAGES
  4. 8.4. SUMMARY
  5. 8.5. APPENDIX: GENERAL DESCRIPTION OF THE MICE PROCEDURE
  6. 8.6. APPENDIX: SOFTWARE LIBRARIES USED IN THIS CHAPTER
  7. NOTES
3. CHAPTER 9: Outliers (Anomalies)
  1. 9.1. INTRODUCTION
  2. 9.2. OUTLIERS DEFINITION, CLASSIFICATION, AND APPROACHES TO DETECTION
  3. 9.3. TEMPORAL STRUCTURE
  4. 9.4. GLOBAL VERSUS LOCAL OUTLIERS, POINT ANOMALIES, AND MICRO-CLUSTERS
  5. 9.5. OUTLIER DETECTION PROBLEM SETUP
  6. 9.6. COMPARATIVE EVALUATION OF OUTLIER DETECTION ALGORITHMS
  7. 9.7. APPROACHES TO OUTLIER EXPLANATION
  8. 9.8. CASE STUDY: OUTLIER DETECTION ON FED COMMUNICATIONS INDEX
  9. 9.9. SUMMARY
  10. 9.10. APPENDIX
  11. NOTES
4. CHAPTER 10: Automotive Fundamental Data
  1. 10.1. INTRODUCTION
  2. 10.2. DATA
  3. 10.3. APPROACH 1: INDIRECT APPROACH
  4. 10.4. APPROACH 2: DIRECT APPROACH
  5. 10.5. GAUSSIAN PROCESSES EXAMPLE
  6. 10.6. SUMMARY
  7. 10.7. APPENDIX
  8. NOTES
5. CHAPTER 11: Surveys and Crowdsourced Data
  1. 11.1. INTRODUCTION
  2. 11.2. SURVEY DATA AS ALTERNATIVE DATA
  3. 11.3. THE DATA
  4. 11.4. THE PRODUCT
  5. 11.5. CASE STUDIES
  6. 11.6. SOME TECHNICAL CONSIDERATIONS ON SURVEYS
  7. 11.7. CROWDSOURCING ANALYST ESTIMATES SURVEY
  8. 11.8. ALPHA CAPTURE DATA
  9. 11.9. SUMMARY
  10. 11.10. APPENDIX
  11. NOTES
6. CHAPTER 12: Purchasing Managers' Index
  1. 12.1. INTRODUCTION
  2. 12.2. PMI PERFORMANCE
  3. 12.3. NOWCASTING GDP GROWTH
  4. 12.4. IMPACTS ON FINANCIAL MARKETS
  5. 12.5. SUMMARY
  6. NOTES
7. CHAPTER 13: Satellite Imagery and Aerial Photography
  1. 13.1. INTRODUCTION
  2. 13.2. FORECASTING US EXPORT GROWTH
  3. 13.3. CAR COUNTS AND EARNINGS PER SHARE FOR RETAILERS
  4. 13.4. MEASURING CHINESE PMI MANUFACTURING WITH SATELLITE DATA
  5. 13.5. SUMMARY
8. CHAPTER 14: Location Data
  1. 14.1. INTRODUCTION
  2. 14.2. SHIPPING DATA TO TRACK CRUDE OIL SUPPLIES
  3. 14.3. MOBILE PHONE LOCATION DATA TO UNDERSTAND RETAIL ACTIVITY
  4. 14.4. TAXI RIDE DATA AND NEW YORK FED MEETINGS
  5. 14.5. CORPORATE JET LOCATION DATA AND M&A
  6. 14.6. SUMMARY
  7. NOTE
9. CHAPTER 15: Text, Web, Social Media, and News
  1. 15.1. INTRODUCTION
  2. 15.2. COLLECTING WEB DATA
  3. 15.3. SOCIAL MEDIA
  4. 15.4. NEWS
  5. 15.5. OTHER WEB SOURCES
  6. 15.6. SUMMARY
  7. NOTES
10. CHAPTER 16: Investor Attention
  1. 16.1. INTRODUCTION
  2. 16.2. READERSHIP OF PAYROLLS TO MEASURE INVESTOR ATTENTION
  3. 16.3. GOOGLE TRENDS DATA TO MEASURE MARKET THEMES
  4. 16.4. INVESTOPEDIA SEARCH DATA TO MEASURE INVESTOR ANXIETY
  5. 16.5. USING WIKIPEDIA TO UNDERSTAND PRICE ACTION IN CRYPTOCURRENCIES
  6. 16.6. ONLINE ATTENTION FOR COUNTRIES TO INFORM EMFX TRADING
  7. 16.7. SUMMARY
11. CHAPTER 17: Consumer Transactions
  1. 17.1. INTRODUCTION
  2. 17.2. CREDIT AND DEBIT CARD TRANSACTION DATA
  3. 17.3. CONSUMER RECEIPTS
  4. 17.4. SUMMARY
  5. NOTE
12. CHAPTER 18: Government, Industrial, and Corporate Data
  1. 18.1. INTRODUCTION
  2. 18.2. USING INNOVATION MEASURES TO TRADE EQUITIES
  3. 18.3. QUANTIFYING CURRENCY CRISIS RISK
  4. 18.4. MODELING CENTRAL BANK INTERVENTION IN CURRENCY MARKETS
  5. 18.5. SUMMARY
13. CHAPTER 19: Market Data
  1. 19.1. INTRODUCTION
  2. 19.2. RELATIONSHIP BETWEEN INSTITUTIONAL FX FLOW DATA AND FX SPOT
  3. 19.3. UNDERSTANDING LIQUIDITY USING HIGH-FREQUENCY FX DATA
  4. 19.4. SUMMARY
  5. NOTE
14. CHAPTER 20: Alternative Data in Private Markets
  1. 20.1. INTRODUCTION
  2. 20.2. DEFINING PRIVATE EQUITY AND VENTURE CAPITAL FIRMS
  3. 20.3. PRIVATE EQUITY DATASETS
  4. 20.4. UNDERSTANDING THE PERFORMANCE OF PRIVATE FIRMS
  5. 20.5. SUMMARY
Conclusions
1. SOME LAST WORDS
References
About the Authors
Index
End User License Agreement

List of Tables

Chapter 1
1. TABLE 1.1 Segmentation of alternative data.
Chapter 4
1. TABLE 4.1 Financial (and non-) problems and suggested modeling techniques.
Chapter 8
1. TABLE 8.1 Summary statistics for MRD metrics for cluster 1 in ...
2. TABLE 8.2 Summary statistics for MRD metrics for cluster 2.
3. TABLE 8.3 Summary statistics for MRD metrics for cluster 2 whe...
4. TABLE 8.4 Summary statistics for MRD metrics for cluster 3 in comparison.
5. TABLE 8.5 Summary statistics for MRD metrics for cluster 3, wh...
Chapter 9
1. TABLE 9.1 Datasets used in comparative analysis of outlier detection algorithms....
Chapter 10
1. TABLE 10.1 Chevrolet Cruze: Top 10 countries unit sales/registrations in 2017....
2. TABLE 10.2 Long/short-portfolio sizes by number of tradeable companies.
3. TABLE 10.3 Top 10 strategies when ranked by CAGR. L – lon...
4. TABLE 10.4 Equal weighted benchmarks.
5. TABLE 10.5 Supporting statistics for top-ranked strategies by CAGR.
6. TABLE 10.6 Automotive factors created from the alternative data set.
7. TABLE 10.7 Freshest automotive factors summary statistics.
8. TABLE 10.8 Top 10 alt data strategies according to CAGR.
9. TABLE 10.9 Long top 33% strategy excess returns vs equal weight...
10. TABLE 10.10 Time averaged Spearman rank correlations.
11. TABLE 10.11 Factors CAGRs.
12. TABLE 10.12 Lags applied in automotive factor calculations.
Chapter 12
1. TABLE 12.1 GDP Growth correlations with % changes of select indicators.
2. TABLE 12.2 Model performance (2010Q1–2018Q1).
Chapter 13
1. TABLE 13.1 Annual correlation between exports, lights, and GDP.
2. TABLE 13.2 Comparing model forecasts through the average percentage derivatio...

List of Illustrations

Chapter 1
1. FIGURE 1.1 The four stages of data transformation: from raw data to a strate...
2. FIGURE 1.2 US GDP growth rate versus PMI; correlation 68%; time period: Q1 2...
3. FIGURE 1.3 China GDP growth rate versus PMI; correlation 69%; time period: Q...
4. FIGURE 1.4 Examples of alternative data usage by different market players....
5. FIGURE 1.5 Alternative data adoption curve: investment management constituen...
6. FIGURE 1.6 Impact of transaction costs on the information ratio of Cuemacro'...
7. FIGURE 1.7 Alternative datasets released commercially per year.
8. FIGURE 1.8 Brands most associated with alternative data.
9. FIGURE 1.9 Total spend on alternative data by buy side.
10. FIGURE 1.10 “Alternative datasets” derived from web scraping: most popular a...
Chapter 2
1. FIGURE 2.1 Different discriminatory pricing mechanisms.
2. FIGURE 2.2 US change in nonfarm payrolls versus ADP private payroll change....
Chapter 3
1. FIGURE 3.1 Comparison of data protection laws around the world.
Chapter 4
1. FIGURE 4.1 Balance between high bias and high variance.
2. FIGURE 4.2 Visualizing linear regression.
3. FIGURE 4.3 Visualizing logistic regression.
4. FIGURE 4.4 SVM example: The black line is the decision boundary.
5. FIGURE 4.5 Kernel trick example.
6. FIGURE 4.6 Visualizing linear regression as a neural network.
7. FIGURE 4.7 Visualizing logistic regression as a neural network.
8. FIGURE 4.8 Visualizing softmax regression as a neural network.
9. FIGURE 4.9 Multi-layer perceptron with 1 hidden layer.
10. FIGURE 4.10 Convolutional neutral network with 3 convolutional layers and 2 ...
11. FIGURE 4.11 Various edge, corner, and blob-based feature detectors.
12. FIGURE 4.12 Dominant feature detection algorithms and their properties.
13. FIGURE 4.13 Frequency of the words “burger” and “king.”
Chapter 5
1. FIGURE 5.1 Cost of setting up a data science team.
Chapter 6
1. FIGURE 6.1 Probabilistic Graphical Model (PGM) showing a potential modeling ...
2. FIGURE 6.2 Another potential modeling sequence (Model B).
3. FIGURE 6.3 A third potential modeling sequence (Model C).
Chapter 7
1. FIGURE 7.1 Average rank for all the classifiers. Column “Avg.” is the averag...
2. FIGURE 7.2 Average rank for the rule induction learning methods.
3. FIGURE 7.3 Average rank for the approximate methods.
4. FIGURE 7.4 Average rank for the lazy learning methods.
5. FIGURE 7.5 Best imputation methods for each group. The three best rankings p...
6. FIGURE 7.6 Methods for pattern classification with missing data. This scheme...
7. FIGURE 7.7 Misclassification error rate (mean ± standard d...
8. FIGURE 7.8 Misclassification error rate (mean ± standard d...
9. FIGURE 7.9 Misclassification error rate (mean ± standard d...
10. FIGURE 7.10 Error rates of input datasets by using LERS new classification....
11. FIGURE 7.11 Error rates of input datasets by using LERS naï...
12. FIGURE 7.12 Mean, standard deviation, and MSE values for the AUC (area under...
Chapter 8
1. FIGURE 8.1 Clustering for CDS time series data: (1) relatively small fractio...
2. FIGURE 8.2 Example of DINEOF imputation for synthetic 2D data.
3. FIGURE 8.3 Top; Example of complete time series data (ticker 1, cluster 2). ...
4. FIGURE 8.4 Amelia (top) and MICE (bottom) imputed time series for data in Fi...
5. FIGURE 8.5 RF imputation (dots) for data in Figure 8.2-3, compared with the ...
6. FIGURE 8.6 DINEOF (top) and MSSA (bottom) imputation (dots) for data in Figu...
7. FIGURE 8.7 Example of complete time series data (ticker 40, cluster 3). The ...
8. FIGURE 8.8 Amelia imputed time series for data in Figure 8.7 (dots), compare...
9. FIGURE 8.9 MSSA imputed time series for data in Figure 8.7 (dots), compared ...
10. FIGURE 8.10 Example of DINEOF imputation for car park data.
11. FIGURE 8.11 Car park image.
12. FIGURE 8.12 Car park image with 50% removed.
13. FIGURE 8.13 Car park image with missing pixels mean filled, pre-DINEOF.
14. FIGURE 8.14 Car park image with missing pixels mean filled, post-DINEOF.
15. FIGURE 8.15 Car park image with missing pixels local mean filled, pre-DINEOF...
16. FIGURE 8.16 Car park image with missing pixels local mean filled, post-DINEO...
Chapter 9
1. FIGURE 9.1 An example of LOF score visualization in 2 dimensions: radius of ...
2. FIGURE 9.2 An illustration of potential difficulties in choosing a normal ne...
3. FIGURE 9.3 An illustration of a case where rank statistic does not provide t...
4. FIGURE 9.4 Outliers explanation in problematic situations: measuring skills ...
5. FIGURE 9.5 Histogram plot of log(text length).
6. FIGURE 9.6 Event types of Fed communication.
7. FIGURE 9.7 Histogram plot of CScores.
8. FIGURE 9.8 Most talkative Fed speakers.
9. FIGURE 9.9 Event types of Fed communications flagged as outliers by unsuperv...
Chapter 10
1. FIGURE 10.1 Mean percent of sales volume known x-months after the end of the...
2. FIGURE 10.2 Mean percent of production volume known x-months after the end o...
3. FIGURE 10.3 The process followed.
4. FIGURE 10.4 Q_pct_delta_ffo quintile CAGRs at 3-months clairvoyance.
5. FIGURE 10.5 Q_pct_delta_ffo returns plot vs quarterly benchmark.
6. FIGURE 10.6 Heatmap of stocks held over time for Q_pct_delta_ffo at 3-months...
7. FIGURE 10.7 revenues_sales_prev_3m_sum_prev_1m_pct_change returns plot vs qu...
8. FIGURE 10.8 revenues_sales_prev_3m_sum_prev_1m_pct_change quintile CAGR.
9. FIGURE 10.9 ww_market_share_prev_1m_pct_change returns plot vs quarterly ben...
10. FIGURE 10.10 ww_market_share_prev_1m_pct_change quintile CAGR.
11. FIGURE 10.11 usa_sales_volume_prev_12m_sum_prev_3m_pct_change returns plot v...
12. FIGURE 10.12 usa_sales_volume_prev_12m_sum_prev_3m_pct_change quintile CAGR....
Chapter 11
1. FIGURE 11.1 Hierarchy of contributors.
2. FIGURE 11.2 Typical timeline of a survey.
3. FIGURE 11.3 The process followed in a survey.
4. FIGURE 11.4 Are you currently playing JX Mobile III (test version)?
5. FIGURE 11.5 Are you willing to pay for JX Mobile III at launch?
6. FIGURE 11.6 How much do/did you spend per month for items in JX PC III?
7. FIGURE 11.7 Performance of the share price of Kingsoft (top) and the Hang Se...
8. FIGURE 11.8 Crude oil production by OPEC as estimated by several data provid...
9. FIGURE 11.9 Monthly changes in oil prices versus changes in OPEC oil supply ...
Chapter 12
1. FIGURE 12.1 Nowcasting Eurozone (EZ) GDP Growth in Q2 2018.
2. FIGURE 12.2 Eurozone GDP and Composite PMI.
3. FIGURE 12.3 GBP/USD intraday volatility around UK PMI Services over past 5 y...
Chapter 13
1. FIGURE 13.1 First picture from Explorer VI satellite.
2. FIGURE 13.2 Car count for Marks & Spencer versus earnings (actual and estima...
3. FIGURE 13.3 Regressing consensus and car count data with earnings per share ...
4. FIGURE 13.4 Regressing news sentiment and car count data with earnings per s...
5. FIGURE 13.5 China SpaceKnow's satellite manufacturing index versus official ...
6. FIGURE 13.6 Surprises in China PMI manufacturing versus consensus, SMI and h...
Chapter 14
1. FIGURE 14.1 Comparing AIS versus official crude oil exports.
2. FIGURE 14.2 Thasos Foot Traffic Index YoY versus US Retail Sales YoY.
3. FIGURE 14.3 Trading XRT based on Thasos Mall Foot Traffic index.
4. FIGURE 14.4 Comparing visits to particular malls.
5. FIGURE 14.5 Comparing Walmart's actual earnings per share against consensus ...
6. FIGURE 14.6 Regressing consensus estimates and footfall against reported ear...
7. FIGURE 14.7 Regressing footfall, news, and Twitter data against reported ear...
8. FIGURE 14.8 Corporate aircraft visits at takeover targets.
Chapter 15
1. FIGURE 15.1 Happiest and saddest words in Hedonometer's corpus.
2. FIGURE 15.2 Hedonometer index for latter part of 2018 till early 2019.
3. FIGURE 15.3 Average Hedonometer score by day of the week.
4. FIGURE 15.4 Happiness Sentiment Index against S&P 500.
5. FIGURE 15.5 Surprise in nonfarm payrolls vs. USD/JPY 1-minute move after rel...
6. FIGURE 15.6 Twitter-based forecast for US change in nonfarm payrolls versus ...
7. FIGURE 15.7 Trading EUR/USD and USD/JPY on an intraday basis around NFP.
8. FIGURE 15.8 S&P 500 versus article count on it on Bloomberg News.
9. FIGURE 15.9 Average daily count of articles per ticker.
10. FIGURE 15.10 USD/JPY news sentiment score versus weekly returns.
11. FIGURE 15.11 News versus trend information ratio.
12. FIGURE 15.12 News versus trend correlation.
13. FIGURE 15.13 News versus trend model returns.
14. FIGURE 15.14 News versus trend model YoY returns.
15. FIGURE 15.15 USD/JPY news volume versus 1M implied volatility.
16. FIGURE 15.16 Regressing news volume versus 1M implied volatility.
17. FIGURE 15.17 EUR/USD ON volatility add-on, implied volatility, realized vola...
18. FIGURE 15.18 EUR/USD ON implied volatility on FOMC days against FOMC news vo...
19. FIGURE 15.19 EUR/USD overnight volatility on FOMC days.
20. FIGURE 15.20 EUR/USD overnight volatility on ECB days.
21. FIGURE 15.21 FOMC sentiment index and UST 10Y yield changes over the past mo...
Chapter 16
1. FIGURE 16.1 “Payrolls” clicks on the days of US employment report.
2. FIGURE 16.2 Search volume for “world cup” in the United States.
3. FIGURE 16.3 Regressing Google Domestic Trend Indices.
4. FIGURE 16.4 S&P 500 versus Google Shock Sentiment.
5. FIGURE 16.5 S&P 500 vs Google Shock Sentiment scatter.
6. FIGURE 16.6 IAI vs VIX.
7. FIGURE 16.7 IAI vs VIX as a scatter plot.
8. FIGURE 16.8 Trading S&P 500 with IAI and VIX.
9. FIGURE 16.9 Turkey PVIX indicator vs USD/TRY 1M implied volatility.
10. FIGURE 16.10 Comparing English attention with local content for Brazil.
11. FIGURE 16.11 Trading a basket of EM currencies using macroeconomy “attention...
Chapter 17
1. FIGURE 17.1 Brazil YoY retail sales versus SpendingPulse Brazil retail sales...
2. FIGURE 17.2 Alternative data forecasts for Amazon revenue versus actual reve...
3. FIGURE 17.3 Comparing Shure versus Sennheiser (MoM) spend at Amazon.
Chapter 18
1. FIGURE 18.1 Long-only portfolios derived from visa and patent data.
2. FIGURE 18.2 Long-only portfolios derived from visa and patent data (in and o...
3. FIGURE 18.3 Average FX crisis rates, 2000–2017.
4. FIGURE 18.4 COFER data: Currency Composition of Official Foreign Exchange Re...
5. FIGURE 18.5 Comparing model estimates of CNY intervention versus official da...
Chapter 19
1. FIGURE 19.1 EUR/USD daily volume.
2. FIGURE 19.2 EUR/USD daily abs net flow.
3. FIGURE 19.3 Multiple regressions between spot returns and net flow.
4. FIGURE 19.4 EUR/USD index versus EUR/USD fund flow score.
5. FIGURE 19.5 Risk-adjusted returns for trend and daily flow-based strategies....
6. FIGURE 19.6 Daily flow and trend returns.
7. FIGURE 19.7 EUR/USD bid/ask spread over time.
8. FIGURE 19.8 EUR/USD and USD/JPY bid/ask spread by time of day.
Chapter 20
1. FIGURE 20.1 AUM of largest GPs (general partners) in billions USD.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey.
Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750–8400, fax (978) 646–8600, or on the Web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748–6011, fax (201) 748–6008, or online at http://www.wiley.com/go/permissions.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762–2974, outside the United States at (317) 572–3993 or fax (317) 572–4002.

Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.

Library of Congress Cataloging-in-Publication Data:

Names: Denev, Alexander, author. | Amen, Saeed, 1982- author.

Title: The book of alternative data : a guide for investors, traders and risk managers / Alexander Denev, Saeed Amen.

Description: Hoboken, New Jersey : Wiley, [2020] | Includes bibliographical references and index.

Identifiers: LCCN 2020008783 (print) | LCCN 2020008784 (ebook) | ISBN 9781119601791 (hardback) | ISBN 9781119601814 (adobe pdf) | ISBN 9781119601807 (epub)

Subjects: LCSH: Investments | Financial risk management. | Big data.

Classification: LCC HG4529 .D47 2020 (print) | LCC HG4529 (ebook) | DDC 332.63/204—dc23

LC record available at https://lccn.loc.gov/2020008783

LC ebook record available at https://lccn.loc.gov/2020008784

Preface

Data permeates through our world, in ever increasing amounts. This fact alone is not sufficient for data to be useful. Indeed, data has no utility, if it is devoid of information, which could aide our understanding. Data needs to be insightful for it to be of use and it also needs to be processed in the appropriate way. In the pre-Big Data age days, statistics such as averages, standard deviation, correlations were calculated on structured datasets to illuminate our understanding of the world. Models were calibrated on (a small number of) input variables which were often well “understood” to obtain an output via well-trodden methods like, say, linear regression.

However, interpreting Big Data, and hence alternative data, comes with many challenges. Big Data is characterized by properties such as volume, velocity and variety and other Vs, which we will discuss in this book. It is impossible to calculate statistics, unless datasets are well structured and relevant features are extracted. When it comes to prediction, the input variables derived from Big Data are numerous and traditional statistical methods can be prone to overfitting. Moreover, nowadays calculating statistics or building models on this data must be done sometimes frequently and in a dynamic way to account for the always changing nature of the data in our high frequency world.

Thanks to technological and methodological advances, understanding Big Data and by extension alternative data, has become a tractable problem. Extracting features from messy enormous volumes of data is now possible thanks to the recent developments in artificial intelligence and machine learning. Cloud infrastructure allows elastic and powerful computation to manage such data flows and to train models both quickly and efficiently. Most of the programming languages in use today are open source and many such as Python have a large number of libraries in the sphere of machine learning and data science more broadly, making it easier to develop tech stacks to number crunch large datasets.

When we decided to write this book, we felt that there was a gap in the book market in this area. This gap seemed at odds with the ever growing importance of data, and in particular, alternative data. We live in a world, which is rich with data, where many datasets are accessible and available at a relatively low cost. Hence, we thought that it was worth writing a lengthy book to address how to address the challenges of how to use data profitably. We do admit though that the world of alternative data and its use cases is and will be subject to change in the near future. As a result, the path we paved with this book is also subject to change. Not least the label “alternative data” might become obsolete as it could soon turn mainstream. Alternative data may simply become “data”. What might seem to be great technological and methodological feats today to make alternative data usable, may soon become trivial exercises. New datasets from sources we could not even imagine could begin to appear, and quantum computing could revolutionise the way we look at data.

We decided to target this book at the investment community. Applications, of course, can be found elsewhere, and indeed everywhere. By staying within the financial domain, we could also have discussed areas such as credit decisions or insurance pricing, for example. We will not discuss these particular applications in this book, as we decided to focus on questions that an investor might face. Of course, we might consider adding these applications in future editions of the book.

At the time of writing, we are living in a world afflicted by COVID-19. It is a world, in which it is very important for decision makers to make the right judgement, and furthermore, these decisions must be done in a timely manner. Delays or poor decision making can have fatal consequences in the current environment. Having access to data streams that track the foot traffic of people can be crucial to curb the spread of the disease. Using satellite or aerial images could be helpful to identify mass gatherings and to disperse them for reasons of public safety. From an asset manager's point of view, creating nowcasts before official macroeconomic figures and company financial statements are released, results better investment decisions. It is no longer sufficient to wait several months to find out about the state of the economy. Investors want to have be able to estimate such points on a very high frequency basis. The recent advances in technology and artificial intelligence makes all this possible.

So, let us commence on our journey through alternative data. We hope you will enjoy this book!

The Book of Alternative Data

A Guide for Investors, Traders, and Risk Managers

Preface

Acknowledgments

PART 1
Introduction and Theory