<p>CONTRIBUTORS xv</p> <p>FOREWORD xvii</p> <p>PREFACE xix</p> <p>THE EDITORS xxix</p> <p><b>PART I STRATEGIES FOR SUCCESS IN THE DIGITAL-DATA REVOLUTION 1</b></p> <p><b>1. The Digital-Data Challenge 5</b><br /> <i>Malcolm Atkinson and Mark Parsons</i></p> <p>1.1 The Digital Revolution 5</p> <p>1.2 Changing How We Think and Behave 6</p> <p>1.3 Moving Adroitly in this Fast-Changing Field 8</p> <p>1.4 Digital-Data Challenges Exist Everywhere 8</p> <p>1.5 Changing How We Work 9</p> <p>1.6 Divide and Conquer Offers the Solution 10</p> <p>1.7 Engineering Data-to-Knowledge Highways 12</p> <p><b>2. The Digital-Data Revolution 15</b><br /> <i>Malcolm Atkinson</i></p> <p>2.1 Data, Information, and Knowledge 16</p> <p>2.2 Increasing Volumes and Diversity of Data 18</p> <p>2.3 Changing the Ways We Work with Data 28</p> <p><b>3. The Data-Intensive Survival Guide 37</b><br /> <i>Malcolm Atkinson</i></p> <p>3.1 Introduction: Challenges and Strategy 38</p> <p>3.2 Three Categories of Expert 39</p> <p>3.3 The Data-Intensive Architecture 41</p> <p>3.4 An Operational Data-Intensive System 42</p> <p>3.5 Introducing DISPEL 44</p> <p>3.6 A Simple DISPEL Example 45</p> <p>3.7 Supporting Data-Intensive Experts 47</p> <p>3.8 DISPEL in the Context of Contemporary Systems 48</p> <p>3.9 Datascopes 51</p> <p>3.10 Ramps for Incremental Engagement 54</p> <p>3.11 Readers’ Guide to the Rest of This Book 56</p> <p><b>4. Data-Intensive Thinking with DISPEL 61</b><br /> <i>Malcolm Atkinson</i></p> <p>4.1 Processing Elements 62</p> <p>4.2 Connections 64</p> <p>4.3 Data Streams and Structure 65</p> <p>4.4 Functions 66</p> <p>4.5 The Three-Level Type System 72</p> <p>4.6 Registry, Libraries, and Descriptions 81</p> <p>4.7 Achieving Data-Intensive Performance 86</p> <p>4.8 Reliability and Control 108</p> <p>4.9 The Data-to-Knowledge Highway 116</p> <p><b>PART II DATA-INTENSIVE KNOWLEDGE DISCOVERY 123</b></p> <p><b>5. Data-Intensive Analysis 127</b><br /> <i>Oscar Corcho and Jano van Hemert</i></p> <p>5.1 Knowledge Discovery in Telco Inc. 128</p> <p>5.2 Understanding Customers to Prevent Churn 130</p> <p>5.3 Preventing Churn Across Multiple Companies 134</p> <p>5.4 Understanding Customers by Combining Heterogeneous Public and Private Data 137</p> <p>5.5 Conclusions 144</p> <p><b>6. Problem Solving in Data-Intensive Knowledge Discovery 147</b><br /> <i>Oscar Corcho and Jano van Hemert</i></p> <p>6.1 The Conventional Life Cycle of Knowledge Discovery 148</p> <p>6.2 Knowledge Discovery Over Heterogeneous Data Sources 155</p> <p>6.3 Knowledge Discovery from Private and Public, Structured and Nonstructured Data 158</p> <p>6.4 Conclusions 162</p> <p><b>7. Data-Intensive Components and Usage Patterns 165</b><br /> <i>Oscar Corcho</i></p> <p>7.1 Data Source Access and Transformation Components 166</p> <p>7.2 Data Integration Components 172</p> <p>7.3 Data Preparation and Processing Components 173</p> <p>7.4 Data-Mining Components 174</p> <p>7.5 Visualization and Knowledge Delivery Components 176</p> <p><b>8. Sharing and Reuse in Knowledge Discovery 181</b><br /> <i>Oscar Corcho</i></p> <p>8.1 Strategies for Sharing and Reuse 182</p> <p>8.2 Data Analysis Ontologies for Data Analysis Experts 185</p> <p>8.3 Generic Ontologies for Metadata Generation 188</p> <p>8.4 Domain Ontologies for Domain Experts 189</p> <p>8.5 Conclusions 190</p> <p><b>PART III DATA-INTENSIVE ENGINEERING 193</b></p> <p><b>9. Platforms for Data-Intensive Analysis 197</b><br /> <i>David Snelling</i></p> <p>9.1 The Hourglass Reprise 198</p> <p>9.2 The Motivation for a Platform 200</p> <p>9.3 Realization 201</p> <p><b>10. Definition of the DISPEL Language 203</b><br /> <i>Paul Martin and Gagarine Yaikhom</i></p> <p>10.1 A Simple Example 204</p> <p>10.2 Processing Elements 205</p> <p>10.3 Data Streams 213</p> <p>10.4 Type System 217</p> <p>10.5 Registration 222</p> <p>10.6 Packaging 224</p> <p>10.7 Workflow Submission 225</p> <p>10.8 Examples of DISPEL 227</p> <p>10.9 Summary 235</p> <p><b>11. DISPEL Development 237</b><br /> <i>Adrian Mouat and David Snelling</i></p> <p>11.1 The Development Landscape 237</p> <p>11.2 Data-Intensive Workbenches 239</p> <p>11.3 Data-Intensive Component Libraries 247</p> <p>11.4 Summary 248</p> <p><b>12. DISPEL Enactment 251</b><br /> <i>Chee Sun Liew, Amrey Krause, and David Snelling</i></p> <p>12.1 Overview of DISPEL Enactment 251</p> <p>12.2 DISPEL Language Processing 253</p> <p>12.3 DISPEL Optimization 255</p> <p>12.4 DISPEL Deployment 266</p> <p>12.5 DISPEL Execution and Control 268</p> <p><b>PART IV DATA-INTENSIVE APPLICATION EXPERIENCE 275</b></p> <p><b>13. The Application Foundations of DISPEL 277</b><br /> <i>Rob Baxter</i></p> <p>13.1 Characteristics of Data-Intensive Applications 277</p> <p>13.2 Evaluating Application Performance 280</p> <p>13.3 Reviewing the Data-Intensive Strategy 283</p> <p><b>14. Analytical Platform for Customer Relationship Management 287</b><br /> <i>Maciej Jarka and Mark Parsons</i></p> <p>14.1 Data Analysis in the Telecoms Business 288</p> <p>14.2 Analytical Customer Relationship Management 289</p> <p>14.3 Scenario 1: Churn Prediction 291</p> <p>14.4 Scenario 2: Cross Selling 293</p> <p>14.5 Exploiting the Models and Rules 296</p> <p>14.6 Summary: Lessons Learned 299</p> <p><b>15. Environmental Risk Management 301</b><br /> <i>Ladislav Hluchy, Ondrej Habala, Viet Tran, and Branislav Simo</i></p> <p>15.1 Environmental Modeling 302</p> <p>15.2 Cascading Simulation Models 303</p> <p>15.3 Environmental Data Sources and Their Management 305</p> <p>15.4 Scenario 1: ORAVA 309</p> <p>15.5 Scenario 2: RADAR 313</p> <p>15.6 Scenario 3: SVP 318</p> <p>15.7 New Technologies for Environmental Data Mining 321</p> <p>15.8 Summary: Lessons Learned 323</p> <p><b>16. Analyzing Gene Expression Imaging Data in Developmental Biology 327</b><br /> <i>Liangxiu Han, Jano van Hemert, Ian Overton, Paolo Besana, and Richard Baldock</i></p> <p>16.1 Understanding Biological Function 328</p> <p>16.2 Gene Image Annotation 330</p> <p>16.3 Automated Annotation of Gene Expression Images 331</p> <p>16.4 Exploitation and Future Work 341</p> <p>16.5 Summary 345</p> <p><b>17. Data-Intensive Seismology: Research Horizons 353</b><br /> <i>Michelle Galea, Andreas Rietbrock, Alessandro Spinuso, and Luca Trani</i></p> <p>17.1 Introduction 354</p> <p>17.2 Seismic Ambient Noise Processing 356</p> <p>17.3 Solution Implementation 358</p> <p>17.4 Evaluation 369</p> <p>17.5 Further Work 372</p> <p>17.6 Conclusions 373</p> <p><b>PART V DATA-INTENSIVE BEACONS OF SUCCESS 377</b></p> <p><b>18. Data-Intensive Methods in Astronomy 381</b><br /> <i>Thomas D. Kitching, Robert G. Mann, Laura E. Valkonen, Mark S. Holliman, Alastair Hume, and Keith T. Noddle</i></p> <p>18.1 Introduction 381</p> <p>18.2 The Virtual Observatory 382</p> <p>18.3 Data-Intensive Photometric Classification of Quasars 383</p> <p>18.4 Probing the Dark Universe with Weak Gravitational Lensing 387</p> <p>18.5 Future Research Issues 392</p> <p>18.6 Conclusions 392</p> <p><b>19. The World at One's Fingertips: Interactive Interpretation of Environmental Data 395</b><br /> <i>Jon Blower, Keith Haines, and Alastair Gemmell</i></p> <p>19.1 Introduction 395</p> <p>19.2 The Current State of the Art 397</p> <p>19.3 The Technical Landscape 401</p> <p>19.4 Interactive Visualization 403</p> <p>19.5 From Visualization to Intercomparison 406</p> <p>19.6 Future Development: The Environmental Cloud 409</p> <p>19.7 Conclusions 411</p> <p><b>20. Data-Driven Research in the Humanities—the DARIAH Research Infrastructure 417</b><br /> <i>Andreas Aschenbrenner, Tobias Blanke, Christiane Fritze, andWolfgang Pempe</i></p> <p>20.1 Introduction 417</p> <p>20.2 The Tradition of Digital Humanities 420</p> <p>20.3 Humanities Research Data 422</p> <p>20.4 Use Case 426</p> <p>20.5 Conclusion and Future Development 429</p> <p><b>21. Analysis of Large and Complex Engineering and Transport Data 431</b><br /> <i>Jim Austin</i></p> <p>21.1 Introduction 431</p> <p>21.2 Applications and Challenges 432</p> <p>21.3 The Methods Used 434</p> <p>21.4 Future Developments 438</p> <p>21.5 Conclusions 439</p> <p>References 440</p> <p><b>22. Estimating Species Distributions—Across Space, Through Time, and with Features of the Environment 441</b><br /> <i>Steve Kelling, Daniel Fink, Wesley Hochachka, Ken Rosenberg, Robert Cook, Theodoros Damoulas, Claudio Silva, and William Michener</i></p> <p>22.1 Introduction 442</p> <p>22.2 Data Discovery, Access, and Synthesis 443</p> <p>22.3 Model Development 448</p> <p>22.4 Managing Computational Requirements 449</p> <p>22.5 Exploring and Visualizing Model Results 450</p> <p>22.6 Analysis Results 452</p> <p>22.7 Conclusion 454</p> <p><b>PART VI THE DATA-INTENSIVE FUTURE 459</b></p> <p><b>23. Data-Intensive Trends 461</b><br /> <i>Malcolm Atkinson and Paolo Besana</i></p> <p>23.1 Reprise 461</p> <p>23.2 Data-Intensive Applications 469</p> <p>24. Data-Rich Futures 477<br /> Malcolm Atkinson</p> <p>24.1 Future Data Infrastructure 478</p> <p>24.2 Future Data Economy 485</p> <p>24.3 Future Data Society and Professionalism 489</p> <p>References 494</p> <p><b>Appendix A: Glossary 499</b><br /> <i>Michelle Galea and Malcolm Atkinson</i></p> <p><b>Appendix B: DISPEL Reference Manual 507</b><br /> <i>Paul Martin</i></p> <p><b>Appendix C: Component Definitions 531</b><br /> <i>Malcolm Atkinson and Chee Sun Liew</i></p> <p>INDEX 537</p>