Details

Machine Learning and Big Data with kdb+/q


Machine Learning and Big Data with kdb+/q


Wiley Finance 1. Aufl.

von: Jan Novotny, Paul A. Bilokon, Aris Galiotos, Frederic Deleze

63,99 €

Verlag: Wiley
Format: PDF
Veröffentl.: 12.11.2019
ISBN/EAN: 9781119404743
Sprache: englisch
Anzahl Seiten: 640

DRM-geschütztes eBook, Sie benötigen z.B. Adobe Digital Editions und eine Adobe ID zum Lesen.

Beschreibungen

<p><b>Upgrade your programming language to more effectively handle high-frequency data </b></p> <p><i>Machine Learning and Big Data with KDB+/Q</i> offers quants, programmers and algorithmic traders a practical entry into the powerful but non-intuitive kdb+ database and q programming language. Ideally designed to handle the speed and volume of high-frequency financial data at sell- and buy-side institutions, these tools have become the de facto standard; this book provides the foundational knowledge practitioners need to work effectively with this rapidly-evolving approach to analytical trading.</p> <p>The discussion follows the natural progression of working strategy development to allow hands-on learning in a familiar sphere, illustrating the contrast of efficiency and capability between the q language and other programming approaches. Rather than an all-encompassing “bible”-type reference, this book is designed with a focus on real-world practicality ­to help you quickly get up to speed and become productive with the language.</p> <ul> <li>Understand why kdb+/q is the ideal solution for high-frequency data</li> <li>Delve into “meat” of q programming to solve practical economic problems</li> <li>Perform everyday operations including basic regressions, cointegration, volatility estimation, modelling and more</li> <li>Learn advanced techniques from market impact and microstructure analyses to machine learning techniques including neural networks</li> </ul> <p>The kdb+ database and its underlying programming language q offer unprecedented speed and capability. As trading algorithms and financial models grow ever more complex against the markets they seek to predict, they encompass an ever-larger swath of data ­– more variables, more metrics, more responsiveness and altogether more “moving parts.”</p> <p>Traditional programming languages are increasingly failing to accommodate the growing speed and volume of data, and lack the necessary flexibility that cutting-edge financial modelling demands. <i>Machine Learning and Big Data with KDB+/Q</i> opens up the technology and flattens the learning curve to help you quickly adopt a more effective set of tools.   </p>
<p>Preface xvii</p> <p>About the Authors xxiii</p> <p><b>Part One Language Fundamentals</b></p> <p><b>Chapter 1 Fundamentals of the q Programming Language 3</b></p> <p>1.1 The (Not So Very) First Steps in q 3</p> <p>1.2 Atoms and Lists 5</p> <p>1.2.1 Casting Types 11</p> <p>1.3 Basic Language Constructs 14</p> <p>1.3.1 Assigning, Equality and Matching 14</p> <p>1.3.2 Arithmetic Operations and Right-to-Left Evaluation: Introduction to q Philosophy 17</p> <p>1.4 Basic Operators 19</p> <p>1.5 Difference between Strings and Symbols 31</p> <p>1.5.1 Enumeration 31</p> <p>1.6 Matrices and Basic Linear Algebra in q 33</p> <p>1.7 Launching the Session: Additional Options 35</p> <p>1.8 Summary and How-To’s 38</p> <p><b>Chapter 2 Dictionaries and Tables: The q Fundamentals 41</b></p> <p>2.1 Dictionary 41</p> <p>2.2 Table 44</p> <p>2.3 The Truth about Tables 48</p> <p>2.4 Keyed Tables are Dictionaries 50</p> <p>2.5 From a Vector Language to an Algebraic Language 51</p> <p><b>Chapter 3 Functions 57</b></p> <p>3.1 Namespace 59</p> <p>3.1.0.1 .quantQ. Namespace 60</p> <p>3.2 The Six Adverbs 60</p> <p>3.2.1 Each 60</p> <p>3.2.1.1 Each 61</p> <p>3.2.1.2 Each-left \: 61</p> <p>3.2.1.3 Each-right /: 62</p> <p>3.2.1.4 Cross Product /: \: 62</p> <p>3.2.1.5 Each-both ' 63</p> <p>3.2.2 Each-prior ': 66</p> <p>3.2.3 Compose (’) 67</p> <p>3.2.4 Over and Fold / 67</p> <p>3.2.5 Scan 68</p> <p>3.2.5.1 EMA: The Exponential Moving Average 69</p> <p>3.2.6 Converge 70</p> <p>3.2.6.1 Converge-repeat 70</p> <p>3.2.6.2 Converge-iterate 71</p> <p>3.3 Apply 72</p> <p>3.3.1 @ (apply) 72</p> <p>3.3.2 . (apply) 73</p> <p>3.4 Protected Evaluations 75</p> <p>3.5 Vector Operations 76</p> <p>3.5.1 Aggregators 76</p> <p>3.5.1.1 Simple Aggregators 76</p> <p>3.5.1.2 Weighted Aggregators 77</p> <p>3.5.2 Uniform Functions 77</p> <p>3.5.2.1 Running Functions 77</p> <p>3.5.2.2 Window Functions 78</p> <p>3.6 Convention for User-Defined Functions 79</p> <p><b>Chapter 4 Editors and Other Tools 81</b></p> <p>4.1 Console 81</p> <p>4.2 Jupyter Notebook 82</p> <p>4.3 GUIs 84</p> <p>4.3.1 qStudio 85</p> <p>4.3.2 Q Insight Pad 88</p> <p>4.4 IDEs: IntelliJ IDEA 90</p> <p>4.5 Conclusion 92</p> <p><b>Chapter 5 Debugging q Code 93</b></p> <p>5.1 Introduction to Making It Wrong: Errors 93</p> <p>5.1.1 Syntax Errors 94</p> <p>5.1.2 Runtime Errors 94</p> <p>5.1.2.1 The Type Error 95</p> <p>5.1.2.2 Other Errors 98</p> <p>5.2 Debugging the Code 100</p> <p>5.3 Debugging Server-Side 102</p> <p><b>Part Two Data Operations</b></p> <p><b>Chapter 6 Splayed and Partitioned Tables 107</b></p> <p>6.1 Introduction 107</p> <p>6.2 Saving a Table as a Single Binary File 108</p> <p>6.3 Splayed Tables 110</p> <p>6.4 Partitioned Tables 113</p> <p>6.5 Conclusion 119</p> <p><b>Chapter 7 Joins 121</b></p> <p>7.1 Comma Operator 121</p> <p>7.2 Join Functions 125</p> <p>7.2.1 ij 125</p> <p>7.2.2 ej 126</p> <p>7.2.3 lj 126</p> <p>7.2.4 pj 127</p> <p>7.2.5 upsert 128</p> <p>7.2.6 uj 129</p> <p>7.2.7 aj 131</p> <p>7.2.8 aj0 134</p> <p>7.2.8.1 The Next Valid Join 135</p> <p>7.2.9 asof 138</p> <p>7.2.10 wj 140</p> <p>7.3 Advanced Example: Running TWAP 144</p> <p><b>Chapter 8 Parallelisation 151</b></p> <p>8.1 Parallel Vector Operations 152</p> <p>8.2 Parallelisation over Processes 155</p> <p>8.3 Map-Reduce 155</p> <p>8.4 Advanced Topic: Parallel File/Directory Access 158</p> <p><b>Chapter 9 Data Cleaning and Filtering 161</b></p> <p>9.1 Predicate Filtering 161</p> <p>9.1.1 The Where Clause 161</p> <p>9.1.2 Aggregation Filtering 163</p> <p>9.2 Data Cleaning, Normalising and APIs 163</p> <p><b>Chapter 10 Parse Trees 165</b></p> <p>10.1 Definition 166</p> <p>10.1.1 Evaluation 166</p> <p>10.1.2 Parse Tree Creation 170</p> <p>10.1.3 Read-Only Evaluation 170</p> <p>10.2 Functional Queries 171</p> <p>10.2.1 Functional Select 174</p> <p>10.2.2 Functional Exec 178</p> <p>10.2.3 Functional Update 179</p> <p>10.2.4 Functional Delete 180</p> <p><b>Chapter 11 A Few Use Cases 181</b></p> <p>11.1 Rolling VWAP 181</p> <p>11.1.1 N Tick VWAP 181</p> <p>11.1.2 TimeWindow VWAP 182</p> <p>11.2 Weighted Mid for N Levels of an Order Book 183</p> <p>11.3 Consecutive Runs of a Rule 185</p> <p>11.4 Real-Time Signals and Alerts 186</p> <p><b>Part Three Data Science</b></p> <p><b>Chapter 12 Basic Overview of Statistics 191</b></p> <p>12.1 Histogram 191</p> <p>12.2 First Moments 196</p> <p>12.3 Hypothesis Testing 198</p> <p>12.3.1 Normal p-values 198</p> <p>12.3.2 Correlation 201</p> <p>12.3.2.1 Implementation 202</p> <p>12.3.3 t-test: One Sample 202</p> <p>12.3.3.1 Implementation 204</p> <p>12.3.4 t-test: Two Samples 204</p> <p>12.3.4.1 Implementation 205</p> <p>12.3.5 Sign Test 206</p> <p>12.3.5.1 Implementation of the Test 208</p> <p>12.3.5.2 Median Test 211</p> <p>12.3.6 Wilcoxon Signed-Rank Test 212</p> <p>12.3.7 Rank Correlation and Somers’ D 214</p> <p>12.3.7.1 Implementation 216</p> <p>12.3.8 Multiple Hypothesis Testing 221</p> <p>12.3.8.1 Bonferroni Correction 224</p> <p>12.3.8.2 Šidák’s Correction 224</p> <p>12.3.8.3 Holm’s Method 225</p> <p>12.3.8.4 Example 226</p> <p><b>Chapter 13 Linear Regression 229</b></p> <p>13.1 Linear Regression 230</p> <p>13.2 Ordinary Least Squares 231</p> <p>13.3 The Geometric Representation of Linear Regression 233</p> <p>13.3.1 Moore–Penrose Pseudoinverse 235</p> <p>13.3.2 Adding Intercept 237</p> <p>13.4 Implementation of the OLS 240</p> <p>13.5 Significance of Parameters 243</p> <p>13.6 How Good is the Fit: R<sup>2</sup> 244</p> <p>13.6.1 Adjusted R-squared 247</p> <p>13.7 Relationship with Maximum Likelihood Estimation and AIC with Small Sample Correction 248</p> <p>13.8 Estimation Suite 252</p> <p>13.9 Comparing Two Nested Models: Towards a Stopping Rule 254</p> <p>13.9.1 Comparing Two General Models 256</p> <p>13.10 In-/Out-of-Sample Operations 257</p> <p>13.11 Cross-validation 262</p> <p>13.12 Conclusion 264</p> <p><b>Chapter 14 Time Series Econometrics 265</b></p> <p>14.1 Autoregressive and Moving Average Processes 265</p> <p>14.1.1 Introduction 265</p> <p>14.1.2 AR(p) Process 266</p> <p>14.1.2.1 Simulation 266</p> <p>14.1.2.2 Estimation of AR(p) Parameters 268</p> <p>14.1.2.3 Least Square Method 268</p> <p>14.1.2.4 Example 269</p> <p>14.1.2.5 Maximum Likelihood Estimator 269</p> <p>14.1.2.6 Yule-Walker Technique 269</p> <p>14.1.3 MA(q) Process 271</p> <p>14.1.3.1 Estimation of MA(q) Parameters 272</p> <p>14.1.3.2 Simulation 272</p> <p>14.1.3.3 Example 273</p> <p>14.1.4 ARMA(p, q) Process 273</p> <p>14.1.4.1 Invertibility of the ARMA(p, q) Process 274</p> <p>14.1.4.2 Hannan-Rissanen Algorithm: Two-Step Regression Estimation 274</p> <p>14.1.4.3 Yule-Walker Estimation 274</p> <p>14.1.4.4 Maximum Likelihood Estimation 275</p> <p>14.1.4.5 Simulation 275</p> <p>14.1.4.6 Forecast 276</p> <p>14.1.5 ARIMA(p, d, q) Process 276</p> <p>14.1.6 Code 276</p> <p>14.1.6.1 Simulation 277</p> <p>14.1.6.2 Estimation 278</p> <p>14.1.6.3 Forecast 282</p> <p>14.2 Stationarity and Granger Causality 285</p> <p>14.2.1 Stationarity 285</p> <p>14.2.2 Test of Stationarity – Dickey-Fuller and Augmented Dickey-Fuller Tests 286</p> <p>14.2.3 Granger Causality 286</p> <p>14.3 Vector Autoregression 287</p> <p>14.3.1 VAR(p) Process 288</p> <p>14.3.1.1 Notation 288</p> <p>14.3.1.2 Estimator 288</p> <p>14.3.1.3 Example 289</p> <p>14.3.1.4 Code 293</p> <p>14.3.2 VARX(p, q) Process 297</p> <p>14.3.2.1 Estimator 297</p> <p>14.3.2.2 Code 298</p> <p><b>Chapter 15 Fourier Transform 301</b></p> <p>15.1 Complex Numbers 301</p> <p>15.1.1 Properties of Complex Numbers 302</p> <p>15.2 Discrete Fourier Transform 308</p> <p>15.3 Addendum: Quaternions 314</p> <p>15.4 Addendum: Fractals 321</p> <p><b>Chapter 16 Eigensystem and PCA 325</b></p> <p>16.1 Theory 325</p> <p>16.2 Algorithms 327</p> <p>16.2.1 QR Decomposition 328</p> <p>16.2.2 QR Algorithm for Eigenvalues 330</p> <p>16.2.3 Inverse Iteration 331</p> <p>16.3 Implementation of Eigensystem Calculation 332</p> <p>16.3.1 QR Decomposition 333</p> <p>16.3.2 Inverse Iteration 337</p> <p>16.4 The Data Matrix and the Principal Component Analysis 341</p> <p>16.4.1 The Data Matrix 341</p> <p>16.4.2 PCA: The First Principal Component 344</p> <p>16.4.3 Second Principal Component 345</p> <p>16.4.4 Terminology and Explained Variance 347</p> <p>16.4.5 Dimensionality Reduction 349</p> <p>16.4.6 PCA Regression (PCR) 350</p> <p>16.5 Implementation of PCA 351</p> <p>16.6 Appendix: Determinant 354</p> <p>16.6.1 Theory 354</p> <p>16.6.2 Techniques to Calculate a Determinant 355</p> <p>16.6.3 Implementation of the Determinant 356</p> <p><b>Chapter 17 Outlier Detection 359</b></p> <p>17.1 Local Outlier Factor 360</p> <p><b>Chapter 18 Simulating Asset Prices 369</b></p> <p>18.1 Stochastic Volatility Process with Price Jumps 369</p> <p>18.2 Towards the Numerical Example 371</p> <p>18.2.1 Numerical Preliminaries 371</p> <p>18.2.2 Implementing Stochastic Volatility Process with Jumps 374</p> <p>18.3 Conclusion 378</p> <p><b>Part Four Machine Learning</b></p> <p><b>Chapter 19 Basic Principles of Machine Learning 381</b></p> <p>19.1 Non-Numeric Features and Normalisation 381</p> <p>19.1.1 Non-Numeric Features 381</p> <p>19.1.1.1 Ordinal Features 382</p> <p>19.1.1.2 Categorical Features 383</p> <p>19.1.2 Normalisation 383</p> <p>19.1.2.1 Normal Score 384</p> <p>19.1.2.2 Range Scaling 385</p> <p>19.2 Iteration: Constructing Machine Learning Algorithms 386</p> <p>19.2.1 Iteration 386</p> <p>19.2.2 Constructing Machine Learning Algorithms 389</p> <p><b>Chapter 20 Linear Regression with Regularisation 391</b></p> <p>20.1 Bias–Variance Trade-off 392</p> <p>20.2 Regularisation 393</p> <p>20.3 Ridge Regression 394</p> <p>20.4 Implementation of the Ridge Regression 396</p> <p>20.4.1 Optimisation of the Regularisation Parameter 401</p> <p>20.5 Lasso Regression 403</p> <p>20.6 Implementation of the Lasso Regression 405</p> <p><b>Chapter 21 Nearest Neighbours 419</b></p> <p>21.1 <i>k</i>-Nearest Neighbours Classifier 419</p> <p>21.2 Prototype Clustering 423</p> <p>21.3 Feature Selection: Local Nearest Neighbours Approach 429</p> <p>21.3.1 Implementation 430</p> <p><b>Chapter 22 Neural Networks 437</b></p> <p>22.1 Theoretical Introduction 437</p> <p>22.1.1 Calibration 440</p> <p>22.1.1.1 Backpropagation 441</p> <p>22.1.2 The Learning Rate Parameter 443</p> <p>22.1.3 Initialisation 443</p> <p>22.1.4 Overfitting 444</p> <p>22.1.5 Dimension of the Hidden Layer(s) 444</p> <p>22.2 Implementation of Neural Networks 445</p> <p>22.2.1 Multivariate Encoder 445</p> <p>22.2.2 Neurons 446</p> <p>22.2.3 Training the Neural Network 448</p> <p>22.3 Examples 451</p> <p>22.3.1 Binary Classification 451</p> <p>22.3.2 M-class Classification 454</p> <p>22.3.3 Regression 457</p> <p>22.4 Possible Suggestions 463</p> <p><b>Chapter 23 AdaBoost with Stumps 465</b></p> <p>23.1 Boosting 465</p> <p>23.2 Decision Stumps 466</p> <p>23.3 AdaBoost 467</p> <p>23.4 Implementation of AdaBoost 468</p> <p>23.5 Recommendation for Readers 474</p> <p><b>Chapter 24 Trees 477</b></p> <p>24.1 Introduction to Trees 477</p> <p>24.2 Regression Trees 479</p> <p>24.2.1 Cost-Complexity Pruning 481</p> <p>24.3 Classification Tree 482</p> <p>24.4 Miscellaneous 484</p> <p>24.5 Implementation of Trees 485</p> <p><b>Chapter 25 Forests 495</b></p> <p>25.1 Bootstrap 495</p> <p>25.2 Bagging 498</p> <p>25.2.1 Out-of-Bag 499</p> <p>25.3 Implementation 500</p> <p>25.3.1 Prediction 503</p> <p>25.3.2 Feature Selection 505</p> <p><b>Chapter 26 Unsupervised Machine Learning: The Apriori Algorithm 509</b></p> <p>26.1 Apriori Algorithm 510</p> <p>26.2 Implementation of the Apriori Algorithm 511</p> <p><b>Chapter 27 Processing Information 523</b></p> <p>27.1 Information Retrieval 523</p> <p>27.1.1 Corpus: Leonardo da Vinci 523</p> <p>27.1.2 Frequency Counting 524</p> <p>27.1.3 tf-idf 528</p> <p>27.2 Information as Features 532</p> <p>27.2.1 Sample: Simulated Proteins 533</p> <p>27.2.2 Kernels and Metrics for Proteins 535</p> <p>27.2.3 Implementation of Inner Products and Nearest Neighbours Principles 535</p> <p>27.2.4 Further Topics 539</p> <p><b>Chapter 28 Towards AI – Monte Carlo Tree Search 541</b></p> <p>28.1 Multi-Armed Bandit Problem 541</p> <p>28.1.1 Analytic Solutions 543</p> <p>28.1.2 Greedy Algorithms 543</p> <p>28.1.3 Confidence-Based Algorithms 544</p> <p>28.1.4 Bayesian Algorithms 546</p> <p>28.1.5 Online Gradient Descent Algorithms 547</p> <p>28.1.6 Implementation of Some Learning Algorithms 547</p> <p>28.2 Monte Carlo Tree Search 558</p> <p>28.2.1 Selection Step 561</p> <p>28.2.2 Expansion Step 562</p> <p>28.2.3 Simulation Step 563</p> <p>28.2.4 Back Propagation Step 563</p> <p>28.2.5 Finishing the Algorithm 563</p> <p>28.2.6 Remarks and Extensions 564</p> <p>28.3 Monte Carlo Tree Search Implementation – Tic-tac-toe 565</p> <p>28.3.1 Random Games 566</p> <p>28.3.2 Towards the MCTS 570</p> <p>28.3.3 Case Study 579</p> <p>28.4 Monte Carlo Tree Search – Additional Comments 579</p> <p>28.4.1 Policy and Value Networks 579</p> <p>28.4.2 Reinforcement Learning 581</p> <p><b>Chapter 29 Econophysics: The Agent-Based Computational Models 583</b></p> <p>29.1 Agent-Based Modelling 584</p> <p>29.1.1 Agent-Based Models in Society 584</p> <p>29.1.2 Agent-Based Models in Finance 586</p> <p>29.2 Ising Agent-Based Model for Financial Markets 587</p> <p>29.2.1 Ising Model in Physics 587</p> <p>29.2.2 Ising Model of Interacting Agents 587</p> <p>29.2.3 Numerical Implementation 588</p> <p>29.3 Conclusion 592</p> <p><b>Chapter 30 Epilogue: Art 595</b></p> <p>Bibliography 601</p> <p>Index 607</p>
<p><b>JAN NOVOTNY</b> is an eFX quant trader at Deutsche Bank. Previously, he worked at the Centre for Econometric Analysis on high-frequency econometric models. He holds a PhD from CERGE-EI, Charles University, Prague. <p><b>PAUL A. BILOKON</b> is CEO and founder of Thalesians Ltd and an expert in algorithmic trading. He previously worked at Nomura, Lehman Brothers, and Morgan Stanley. Paul was educated at Christ Church College, Oxford, and Imperial College. <p><b>ARIS GALIOTOS</b> is the global technical lead for the eFX kdb+ team at HSBC, where he helps develop a big data installation processing billions of real-time records per day. Aris holds an MSc in Financial Mathematics with Distinction from the University of Edinburgh. <p><b>FRÉDÉRIC DÉLÈZE</b> is an independent algorithm trader and consultant. He has designed automated trading strategies for hedge funds and developed quantitative risk models for investment banks. He holds a PhD in Finance from Hanken School of Economics, Helsinki.
<p><b>Develop solid high-frequency strategies with q's unprecedented speed and efficiency</b> <p>In the world of high-frequency trading, the q programming language and kdb+ database have risen to the top of the ranks as tools for implementing quantitative analyses of all types. Until now, there has been a lack of accessible, implementation-focused books to assist in Data Science and Machine Learning using this technology. <i>Machine Learning and Big Data with kdb+/q</i> bridges this conspicuous gap, providing you with a practical introduction to the q language and a guide to using data science to enable data-driven decision making. You'll also learn the basic principles and techniques underpinning powerful trading mechanisms based upon machine learning. <p>This book opens the world of q and kdb+ to a wide audience, as it emphasises solutions to problems of practical importance. Implementations covered include: <ul> <li>Data description and summary statistics</li> <li>Basic regression methods and cointegration</li> <li>Volatility estimation and time series modelling</li> <li>Advanced machine learning techniques, including neural networks, random forests, and principal component analysis</li> <li>Techniques useful beyond finance related to text analysis, game engines and agent based models</li> </ul> <p>Written by four top figures in global quantitative finance and technology, <i>Machine Learning and Big Data with kdb+/q</i> is a valuable resource in high-frequency trading.

Diese Produkte könnten Sie auch interessieren:

Mindfulness
Mindfulness
von: Gill Hasson
PDF ebook
12,99 €
Counterparty Credit Risk, Collateral and Funding
Counterparty Credit Risk, Collateral and Funding
von: Damiano Brigo, Massimo Morini, Andrea Pallavicini
EPUB ebook
69,99 €