Details

High-Performance Parallel Database Processing and Grid Databases


High-Performance Parallel Database Processing and Grid Databases


Wiley Series on Parallel and Distributed Computing, Band 67 1. Aufl.

von: David Taniar, Clement H. C. Leung, Wenny Rahayu, Sushant Goel

158,99 €

Verlag: Wiley
Format: PDF
Veröffentl.: 17.09.2008
ISBN/EAN: 9780470391358
Sprache: englisch
Anzahl Seiten: 576

DRM-geschütztes eBook, Sie benötigen z.B. Adobe Digital Editions und eine Adobe ID zum Lesen.

Beschreibungen

<b>The latest techniques and principles of parallel and grid database processing</b> <p>The growth in grid databases, coupled with the utility of parallel query processing, presents an important opportunity to understand and utilize high-performance parallel database processing within a major database management system (DBMS). This important new book provides readers with a fundamental understanding of parallelism in data-intensive applications, and demonstrates how to develop faster capabilities to support them. It presents a balanced treatment of the theoretical and practical aspects of high-performance databases to demonstrate how parallel query is executed in a DBMS, including concepts, algorithms, analytical models, and grid transactions.</p> <p><i>High-Performance Parallel Database Processing and Grid Databases</i> serves as a valuable resource for researchers working in parallel databases and for practitioners interested in building a high-performance database. It is also a much-needed, self-contained textbook for database courses at the advanced undergraduate and graduate levels.</p>
<p>Preface xv</p> <p><b>Part I Introduction</b></p> <p><b>1. Introduction 3</b></p> <p>1.1. A Brief Overview: Parallel Databases and Grid Databases 4</p> <p>1.2. Parallel Query Processing: Motivations 5</p> <p>1.3. Parallel Query Processing: Objectives 7</p> <p>1.3.1. Speed Up 7</p> <p>1.3.2. Scale Up 8</p> <p>1.3.3. Parallel Obstacles 10</p> <p>1.4. Forms of Parallelism 12</p> <p>1.4.1. Interquery Parallelism 13</p> <p>1.4.2. Intraquery Parallelism 14</p> <p>1.4.3. Intraoperation Parallelism 15</p> <p>1.4.4. Interoperation Parallelism 15</p> <p>1.4.5. Mixed Parallelism—A More Practical Solution 18</p> <p>1.5. Parallel Database Architectures 19</p> <p>1.5.1. Shared-Memory and Shared-Disk Architectures 20</p> <p>1.5.2. Shared-Nothing Architecture 22</p> <p>1.5.3. Shared-Something Architecture 23</p> <p>1.5.4. Interconnection Networks 24</p> <p>1.6. Grid Database Architecture 26</p> <p>1.7. Structure of this Book 29</p> <p>1.8. Summary 30</p> <p>1.9. Bibliographical Notes 30</p> <p>1.10. Exercises 31</p> <p><b>2. Analytical Models 33</b></p> <p>2.1. Cost Models 33</p> <p>2.2. Cost Notations 34</p> <p>2.2.1. Data Parameters 34</p> <p>2.2.2. Systems Parameters 36</p> <p>2.2.3. Query Parameters 37</p> <p>2.2.4. Time Unit Costs 37</p> <p>2.2.5. Communication Costs 38</p> <p>2.3. Skew Model 39</p> <p>2.4. Basic Operations in Parallel Databases 43</p> <p>2.4.1. Disk Operations 44</p> <p>2.4.2. Main Memory Operations 45</p> <p>2.4.3. Data Computation and Data Distribution 45</p> <p>2.5. Summary 47</p> <p>2.6. Bibliographical Notes 47</p> <p>2.7. Exercises 47</p> <p><b>Part II Basic Query Parallelism</b></p> <p><b>3. Parallel Search 51</b></p> <p>3.1. Search Queries 51</p> <p>3.1.1. Exact-Match Search 52</p> <p>3.1.2. Range Search Query 53</p> <p>3.1.3. Multiattribute Search Query 54</p> <p>3.2. Data Partitioning 54</p> <p>3.2.1. Basic Data Partitioning 55</p> <p>3.2.2. Complex Data Partitioning 60</p> <p>3.3. Search Algorithms 69</p> <p>3.3.1. Serial Search Algorithms 69</p> <p>3.3.2. Parallel Search Algorithms 73</p> <p>3.4. Summary 74</p> <p>3.5. Bibliographical Notes 75</p> <p>3.6. Exercises 75</p> <p><b>4. Parallel Sort and GroupBy 77</b></p> <p>4.1. Sorting, Duplicate Removal, and Aggregate Queries 78</p> <p>4.1.1. Sorting and Duplicate Removal 78</p> <p>4.1.2. Scalar Aggregate 79</p> <p>4.1.3. GroupBy 80</p> <p>4.2. Serial External Sorting Method 80</p> <p>4.3. Algorithms for Parallel External Sort 83</p> <p>4.3.1. Parallel Merge-All Sort 83</p> <p>4.3.2. Parallel Binary-Merge Sort 85</p> <p>4.3.3. Parallel Redistribution Binary-Merge Sort 86</p> <p>4.3.4. Parallel Redistribution Merge-All Sort 88</p> <p>4.3.5. Parallel Partitioned Sort 90</p> <p>4.4. Parallel Algorithms for GroupBy Queries 92</p> <p>4.4.1. Traditional Methods (Merge-All and Hierarchical Merging) 92</p> <p>4.4.2. Two-Phase Method 93</p> <p>4.4.3. Redistribution Method 94</p> <p>4.5. Cost Models for Parallel Sort 96</p> <p>4.5.1. Cost Models for Serial External Merge-Sort 96</p> <p>4.5.2. Cost Models for Parallel Merge-All Sort 98</p> <p>4.5.3. Cost Models for Parallel Binary-Merge Sort 100</p> <p>4.5.4. Cost Models for Parallel Redistribution Binary-Merge Sort 101</p> <p>4.5.5. Cost Models for Parallel Redistribution Merge-All Sort 102</p> <p>4.5.6. Cost Models for Parallel Partitioned Sort 103</p> <p>4.6. Cost Models for Parallel GroupBy 104</p> <p>4.6.1. Cost Models for Parallel Two-Phase Method 104</p> <p>4.6.2. Cost Models for Parallel Redistribution Method 107</p> <p>4.7. Summary 109</p> <p>4.8. Bibliographical Notes 110</p> <p>4.9. Exercises 110</p> <p><b>5. Parallel Join 112</b></p> <p>5.1. Join Operations 112</p> <p>5.2. Serial Join Algorithms 114</p> <p>5.2.1. Nested-Loop Join Algorithm 114</p> <p>5.2.2. Sort-Merge Join Algorithm 116</p> <p>5.2.3. Hash-Based Join Algorithm 117</p> <p>5.2.4. Comparison 120</p> <p>5.3. Parallel Join Algorithms 120</p> <p>5.3.1. Divide and Broadcast-Based Parallel Join Algorithms 121</p> <p>5.3.2. Disjoint Partitioning-Based Parallel Join Algorithms 124</p> <p>5.4. Cost Models 128</p> <p>5.4.1. Cost Models for Divide and Broadcast 128</p> <p>5.4.2. Cost Models for Disjoint Partitioning 129</p> <p>5.4.3. Cost Models for Local Join 130</p> <p>5.5. Parallel Join Optimization 132</p> <p>5.5.1. Optimizing Main Memory 132</p> <p>5.5.2. Load Balancing 133</p> <p>5.6. Summary 134</p> <p>5.7. Bibliographical Notes 135</p> <p>5.8. Exercises 136</p> <p><b>Part III Advanced Parallel Query Processing</b></p> <p><b>6. Parallel GroupBy-Join 141</b></p> <p>6.1. Groupby-Join Queries 141</p> <p>6.1.1. Groupby Before Join 142</p> <p>6.1.2. Groupby After Join 142</p> <p>6.2. Parallel Algorithms for Groupby-Before-Join Query Processing 143</p> <p>6.2.1. Early Distribution Scheme 143</p> <p>6.2.2. Early GroupBy with Partitioning Scheme 145</p> <p>6.2.3. Early GroupBy with Replication Scheme 146</p> <p>6.3. Parallel Algorithms for Groupby-After-Join Query Processing 148</p> <p>6.3.1. Join Partitioning Scheme 148</p> <p>6.3.2. GroupBy Partitioning Scheme 150</p> <p>6.4. Cost Model Notations 151</p> <p>6.5. Cost Model for Groupby-Before-Join Query Processing 153</p> <p>6.5.1. Cost Models for the Early Distribution Scheme 153</p> <p>6.5.2. Cost Models for the Early GroupBy with Partitioning Scheme 156</p> <p>6.5.3. Cost Models for the Early GroupBy with Replication Scheme 158</p> <p>6.6. Cost Model for “Groupby-After-Join” Query Processing 159</p> <p>6.6.1. Cost Models for the Join Partitioning Scheme 159</p> <p>6.6.2. Cost Models for the GroupBy Partitioning Scheme 161</p> <p>6.7. Summary 163</p> <p>6.8. Bibliographical Notes 164</p> <p>6.9. Exercises 164</p> <p><b>7. Parallel Indexing 167</b></p> <p>7.1. Parallel Indexing–an Internal Perspective on Parallel Indexing Structures 168</p> <p>7.2. Parallel Indexing Structures 169</p> <p>7.2.1. Nonreplicated Indexing (NRI) Structures 169</p> <p>7.2.2. Partially Replicated Indexing (PRI) Structures 171</p> <p>7.2.3. Fully Replicated Indexing (FRI) Structures 178</p> <p>7.3. Index Maintenance 180</p> <p>7.3.1. Maintaining a Parallel Nonreplicated Index 182</p> <p>7.3.2. Maintaining a Parallel Partially Replicated Index 182</p> <p>7.3.3. Maintaining a Parallel Fully Replicated Index 188</p> <p>7.3.4. Complexity Degree of Index Maintenance 188</p> <p>7.4. Index Storage Analysis 188</p> <p>7.4.1. Storage Cost Models for Uniprocessors 189</p> <p>7.4.2. Storage Cost Models for Parallel Processors 191</p> <p>7.5. Parallel Processing of Search Queries using Index 192</p> <p>7.5.1. Parallel One-Index Search Query Processing 192</p> <p>7.5.2. Parallel Multi-Index Search Query Processing 195</p> <p>7.6. Parallel Index Join Algorithms 200</p> <p>7.6.1. Parallel One-Index Join 200</p> <p>7.6.2. Parallel Two-Index Join 203</p> <p>7.7. Comparative Analysis 207</p> <p>7.7.1. Comparative Analysis of Parallel Search Index 207</p> <p>7.7.2. Comparative Analysis of Parallel Index Join 213</p> <p>7.8. Summary 216</p> <p>7.9. Bibliographical Notes 217</p> <p>7.10. Exercises 217</p> <p><b>8. Parallel Universal Qualification—Collection Join Queries 219</b></p> <p>8.1. Universal Quantification and Collection Join 220</p> <p>8.2. Collection Types and Collection Join Queries 222</p> <p>8.2.1. Collection-Equi Join Queries 222</p> <p>8.2.2. Collection–Intersect Join Queries 223</p> <p>8.2.3. Subcollection Join Queries 224</p> <p>8.3. Parallel Algorithms for Collection Join Queries 225</p> <p>8.4. Parallel Collection-Equi Join Algorithms 225</p> <p>8.4.1. Disjoint Data Partitioning 226</p> <p>8.4.2. Parallel Double Sort-Merge Collection-Equi Join Algorithm 227</p> <p>8.4.3. Parallel Sort-Hash Collection-Equi Join Algorithm 228</p> <p>8.4.4. Parallel Hash Collection-Equi Join Algorithm 232</p> <p>8.5. Parallel Collection-Intersect Join Algorithms 233</p> <p>8.5.1. Non-Disjoint Data Partitioning 234</p> <p>8.5.2. Parallel Sort-Merge Nested-Loop Collection-Intersect Join Algorithm 244</p> <p>8.5.3. Parallel Sort-Hash Collection-Intersect Join Algorithm 245</p> <p>8.5.4. Parallel Hash Collection-Intersect Join Algorithm 246</p> <p>8.6. Parallel Subcollection Join Algorithms 246</p> <p>8.6.1. Data Partitioning 247</p> <p>8.6.2. Parallel Sort-Merge Nested-Loop Subcollection Join Algorithm 248</p> <p>8.6.3. Parallel Sort-Hash Subcollection Join Algorithm 249</p> <p>8.6.4. Parallel Hash Subcollection Join Algorithm 251</p> <p>8.7. Summary 252</p> <p>8.8. Bibliographical Notes 252</p> <p>8.9. Exercises 254</p> <p><b>9. Parallel Query Scheduling and Optimization 256</b></p> <p>9.1. Query Execution Plan 257</p> <p>9.2. Subqueries Execution Scheduling Strategies 259</p> <p>9.2.1. Serial Execution Among Subqueries 259</p> <p>9.2.2. Parallel Execution Among Subqueries 261</p> <p>9.3. Serial vs. Parallel Execution Scheduling 264</p> <p>9.3.1. Nonskewed Subqueries 264</p> <p>9.3.2. Skewed Subqueries 265</p> <p>9.3.3. Skewed and Nonskewed Subqueries 267</p> <p>9.4. Scheduling Rules 269</p> <p>9.5. Cluster Query Processing Model 270</p> <p>9.5.1. Overview of Dynamic Query Processing 271</p> <p>9.5.2. A Cluster Query Processing Architecture 272</p> <p>9.5.3. Load Information Exchange 273</p> <p>9.6. Dynamic Cluster Query Optimization 275</p> <p>9.6.1. Correction 276</p> <p>9.6.2. Migration 280</p> <p>9.6.3. Partition 281</p> <p>9.7. Other Approaches to Dynamic Query Optimization 284</p> <p>9.8. Summary 285</p> <p>9.9. Bibliographical Notes 286</p> <p>9.10. Exercises 286</p> <p><b>Part IV Grid Databases</b></p> <p><b>10. Transactions in Distributed and Grid Databases 291</b></p> <p>10.1. Grid Database Challenges 292</p> <p>10.2. Distributed Database Systems and Multidatabase Systems 293</p> <p>10.2.1. Distributed Database Systems 293</p> <p>10.2.2. Multidatabase Systems 297</p> <p>10.3. Basic Definitions on Transaction Management 299</p> <p>10.4. Acid Properties of Transactions 301</p> <p>10.5. Transaction Management in Various Database Systems 303</p> <p>10.5.1. Transaction Management in Centralized and Homogeneous Distributed Database Systems 303</p> <p>10.5.2. Transaction Management in Heterogeneous Distributed Database Systems 305</p> <p>10.6. Requirements in Grid Database Systems 307</p> <p>10.7. Concurrency Control Protocols 309</p> <p>10.8. Atomic Commit Protocols 310</p> <p>10.8.1. Homogeneous Distributed Database Systems 310</p> <p>10.8.2. Heterogeneous Distributed Database Systems 313</p> <p>10.9. Replica Synchronization Protocols 314</p> <p>10.9.1. Network Partitioning 315</p> <p>10.9.2. Replica Synchronization Protocols 316</p> <p>10.10. Summary 318</p> <p>10.11. Bibliographical Notes 318</p> <p>10.12. Exercises 319</p> <p><b>11. Grid Concurrency Control 321</b></p> <p>11.1. A Grid Database Environment 321</p> <p>11.2. An Example 322</p> <p>11.3. Grid Concurrency Control 324</p> <p>11.3.1. Basic Functions Required by GCC 324</p> <p>11.3.2. Grid Serializability Theorem 325</p> <p>11.3.3. Grid Concurrency Control Protocol 329</p> <p>11.3.4. Revisiting the Earlier Example 333</p> <p>11.3.5. Comparison with Traditional Concurrency Control Protocols 334</p> <p>11.4. Correctness of GCC Protocol 336</p> <p>11.5. Features of GCC Protocol 338</p> <p>11.6. Summary 339</p> <p>11.7. Bibliographical Notes 339</p> <p>11.8. Exercises 339</p> <p><b>12. Grid Transaction Atomicity and Durability 341</b></p> <p>12.1. Motivation 342</p> <p>12.2. Grid Atomic Commit Protocol (Grid-ACP) 343</p> <p>12.2.1. State Diagram of Grid-ACP 343</p> <p>12.2.2. Grid-ACP Algorithm 344</p> <p>12.2.3. Early-Abort Grid-ACP 346</p> <p>12.2.4. Discussion 348</p> <p>12.2.5. Message and Time Complexity Comparison Analysis 349</p> <p>12.2.6. Correctness of Grid-ACP 350</p> <p>12.3. Handling Failure of Sites with Grid-ACP 351</p> <p>12.3.1. Model for Storing Log Files at the Originator and Participating Sites 351</p> <p>12.3.2. Logs Required at the Originator Site 352</p> <p>12.3.3. Logs Required at the Participant Site 353</p> <p>12.3.4. Failure Recovery Algorithm for Grid-ACP 353</p> <p>12.3.5. Comparison of Recovery Protocols 359</p> <p>12.3.6. Correctness of Recovery Algorithm 361</p> <p>12.4. Summary 365</p> <p>12.5. Bibliographical Notes 366</p> <p>12.6. Exercises 366</p> <p><b>13. Replica Management in Grids 367</b></p> <p>13.1. Motivation 367</p> <p>13.2. Replica Architecture 368</p> <p>13.2.1. High-Level Replica Management Architecture 368</p> <p>13.2.2. Some Problems 369</p> <p>13.3. Grid Replica Access Protocol (GRAP) 371</p> <p>13.3.1. Read Transaction Operation for GRAP 371</p> <p>13.3.2. Write Transaction Operation for GRAP 372</p> <p>13.3.3. Revisiting the Example Problem 375</p> <p>13.3.4. Correctness of GRAP 377</p> <p>13.4. Handling Multiple Partitioning 378</p> <p>13.4.1. Contingency GRAP 378</p> <p>13.4.2. Comparison of Replica Management Protocols 381</p> <p>13.4.3. Correctness of Contingency GRAP 383</p> <p>13.5. Summary 384</p> <p>13.6. Bibliographical Notes 385</p> <p>13.7. Exercises 385</p> <p><b>14. Grid Atomic Commitment in Replicated Data 387</b></p> <p>14.1. Motivation 388</p> <p>14.1.1. Architectural Reasons 388</p> <p>14.1.2. Motivating Example 388</p> <p>14.2. Modified Grid Atomic Commitment Protocol 390</p> <p>14.2.1. Modified Grid-ACP 390</p> <p>14.2.2. Correctness of Modified Grid-ACP 393</p> <p>14.3. Transaction Properties in Replicated Environment 395</p> <p>14.4. Summary 397</p> <p>14.5. Bibliographical Notes 397</p> <p>14.6. Exercises 398</p> <p><b>Part V Other Data-Intensive Applications</b></p> <p><b>15. Parallel Online Analytic Processing (OLAP) and Business Intelligence 401</b></p> <p>15.1. Parallel Multidimensional Analysis 402</p> <p>15.2. Parallelization of ROLLUP Queries 405</p> <p>15.2.1. Analysis of Basic Single ROLLUP Queries 405</p> <p>15.2.2. Analysis of Multiple ROLLUP Queries 409</p> <p>15.2.3. Analysis of Partial ROLLUP Queries 411</p> <p>15.2.4. Parallelization Without Using ROLLUP 412</p> <p>15.3. Parallelization of CUBE Queries 412</p> <p>15.3.1. Analysis of Basic CUBE Queries 413</p> <p>15.3.2. Analysis of Partial CUBE Queries 416</p> <p>15.3.3. Parallelization Without Using CUBE 417</p> <p>15.4. Parallelization of Top-<i>N </i>and Ranking Queries 418</p> <p>15.5. Parallelization of Cume_Dist Queries 419</p> <p>15.6. Parallelization of NTILE and Histogram Queries 420</p> <p>15.7. Parallelization of Moving Average and Windowing Queries 422</p> <p>15.8. Summary 424</p> <p>15.9. Bibliographical Notes 424</p> <p>15.10. Exercises 425</p> <p><b>16. Parallel Data Mining—Association Rules and Sequential Patterns 427</b></p> <p>16.1. From Databases To Data Warehousing To Data Mining: A Journey 428</p> <p>16.2. Data Mining: A Brief Overview 431</p> <p>16.2.1. Data Mining Tasks 431</p> <p>16.2.2. Querying vs. Mining 433</p> <p>16.2.3. Parallelism in Data Mining 436</p> <p>16.3. Parallel Association Rules 440</p> <p>16.3.1. Association Rules: Concepts 441</p> <p>16.3.2. Association Rules: Processes 444</p> <p>16.3.3. Association Rules: Parallel Processing 448</p> <p>16.4. Parallel Sequential Patterns 450</p> <p>16.4.1. Sequential Patterns: Concepts 452</p> <p>16.4.2. Sequential Patterns: Processes 456</p> <p>16.4.3. Sequential Patterns: Parallel Processing 459</p> <p>16.5. Summary 461</p> <p>16.6. Bibliographical Notes 461</p> <p>16.7. Exercises 462</p> <p><b>17. Parallel Clustering and Classification 464</b></p> <p>17.1. Clustering and Classification 464</p> <p>17.1.1. Clustering 464</p> <p>17.1.2. Classification 465</p> <p>17.2. Parallel Clustering 467</p> <p>17.2.1. Clustering: Concepts 467</p> <p>17.2.2. k-Means Algorithm 468</p> <p>17.2.3. Parallel<i> k</i>-Means Clustering 471</p> <p>17.3. Parallel Classification 477</p> <p>17.3.1. Decision Tree Classification: Structures 477</p> <p>17.3.2. Decision Tree Classification: Processes 480</p> <p>17.3.3. Decision Tree Classification: Parallel Processing 488</p> <p>17.4. Summary 495</p> <p>17.5. Bibliographical Notes 498</p> <p>17.6. Exercises 498</p> <p>Permissions 501</p> <p>List of Conferences and Journals 507</p> <p>Bibliography 511</p> <p>Index 541</p>
<b>David Taniar,</b> PhD, lectures in information technology at Monash University, Australia. Dr. Taniar has published extensively in the field of high- performance parallel databases and is the Editor in Chief of the International Journal of Data Warehousing and Mining. <p><b>Clement H. C. Leung,</b> PhD, is Foundation Chair in Computer Science at Victoria University, Australia. Dr. Leung previously held the Established Chair in Computer Science at the University of London.</p> <p><b>Wenny Rahayu,</b> PhD, is Associate Professor at La Trobe University, Australia, and actively works in the areas of database design and implementation, covering object-relational databases and Web databases.</p> <p><b>Sushant Goel,</b> PhD, is a software consultant and holds a PhD in computer systems engineering from RMIT University, Australia. His research interests are in grid transaction management and software development processes, such as agile computing.</p>
<b>The latest techniques and principles of parallel and grid database processing</b> <p>The growth in grid databases, coupled with the utility of parallel query processing, presents an important opportunity to understand and utilize high-performance parallel database processing within a major database management system (DBMS). This important new book provides readers with a fundamental understanding of parallelism in data-intensive applications, and demonstrates how to develop faster capabilities to support them. It presents a balanced treatment of the theoretical and practical aspects of high-performance databases to demonstrate how parallel query is executed in a DBMS, including concepts, algorithms, analytical models, and grid transactions.</p> <p><i>High-Performance Parallel Database Processing and Grid Databases</i> serves as a valuable resource for researchers working in parallel databases and for practitioners interested in building a high-performance database. It is also a much-needed, self-contained textbook for database courses at the advanced undergraduate and graduate levels.</p>

Diese Produkte könnten Sie auch interessieren:

The CISO Evolution
The CISO Evolution
von: Matthew K. Sharp, Kyriakos Lambros
PDF ebook
33,99 €
Data Mining and Machine Learning Applications
Data Mining and Machine Learning Applications
von: Rohit Raja, Kapil Kumar Nagwanshi, Sandeep Kumar, K. Ramya Laxmi
EPUB ebook
190,99 €