Details

Data Analytics in the AWS Cloud


Data Analytics in the AWS Cloud

Building a Data Platform for BI and Predictive Analytics on AWS
1. Aufl.

von: Joe Minichino

38,99 €

Verlag: Wiley
Format: PDF
Veröffentl.: 31.03.2023
ISBN/EAN: 9781119909262
Sprache: englisch
Anzahl Seiten: 416

DRM-geschütztes eBook, Sie benötigen z.B. Adobe Digital Editions und eine Adobe ID zum Lesen.

Beschreibungen

<p><b>A comprehensive and accessible roadmap to performing data analytics in the AWS cloud</b> <p>In <i>Data Analytics in the AWS Cloud: Building a Data Platform for BI and Predictive Analytics on AWS</i>, accomplished software engineer and data architect Joe Minichino delivers an expert blueprint to storing, processing, analyzing data on the Amazon Web Services cloud platform. In the book, you’ll explore every relevant aspect of data analytics—from data engineering to analysis, business intelligence, DevOps, and MLOps—as you discover how to integrate machine learning predictions with analytics engines and visualization tools. <p>You’ll also find: <ul> <li>Real-world use cases of AWS architectures that demystify the applications of data analytics</li> <li>Accessible introductions to data acquisition, importation, storage, visualization, and reporting</li> <li>Expert insights into serverless data engineering and how to use it to reduce overhead and costs, improve stability, and simplify maintenance</li></ul><p>A can't-miss for data architects, analysts, engineers and technical professionals, <i>Data Analytics in the AWS Cloud</i> will also earn a place on the bookshelves of business leaders seeking a better understanding of data analytics on the AWS cloud platform.
<p>Introduction xxiii</p> <p><b>Chapter 1 AWS Data Lakes and Analytics Technology Overview 1</b></p> <p>Why AWS? 1</p> <p>What Does a Data Lake Look Like in AWS? 2</p> <p>Analytics on AWS 3</p> <p>Skills Required to Build and Maintain an AWS Analytics Pipeline 3</p> <p><b>Chapter 2 The Path to Analytics: Setting Up a Data and Analytics Team 5</b></p> <p>The Data Vision 6</p> <p>Support 6</p> <p>DA Team Roles 7</p> <p>Early Stage Roles 7</p> <p>Team Lead 8</p> <p>Data Architect 8</p> <p>Data Engineer 8</p> <p>Data Analyst 9</p> <p>Maturity Stage Roles 9</p> <p>Data Scientist 9</p> <p>Cloud Engineer 10</p> <p>Business Intelligence (BI) Developer 10</p> <p>Machine Learning Engineer 10</p> <p>Business Analyst 11</p> <p>Niche Roles 11</p> <p>Analytics Flow at a Process Level 12</p> <p>Workflow Methodology 12</p> <p>The DA Team Mantra: “Automate Everything” 14</p> <p>Analytics Models in the Wild: Centralized, Distributed, Center of Excellence 15</p> <p>Centralized 15</p> <p>Distributed 16</p> <p>Center of Excellence 16</p> <p>Summary 17</p> <p><b>Chapter 3 Working on AWS 19</b></p> <p>Accessing AWS 20</p> <p>Everything Is a Resource 21</p> <p>S3: An Important Exception 21</p> <p>IAM: Policies, Roles, and Users 22</p> <p>Policies 22</p> <p>Identity- Based Policies 24</p> <p>Resource- Based Policies 25</p> <p>Roles 25</p> <p>Users and User Groups 25</p> <p>Summarizing IAM 26</p> <p>Working with the Web Console 26</p> <p>The AWS Command- Line Interface 29</p> <p>Installing AWS cli 29</p> <p>Linux Installation 30</p> <p>macOS Installation 30</p> <p>Windows 31</p> <p>Configuring AWS cli 31</p> <p>A Note on Region 33</p> <p>Setting Individual Parameters 33</p> <p>Using Profiles and Configuration Files 33</p> <p>Final Notes on Configuration 36</p> <p>Using the AWS cli 36</p> <p>Using Skeletons and File Inputs 39</p> <p>Cleaning Up! 43</p> <p>Infrastructure- as- Code: CloudFormation and Terraform 44</p> <p>CloudFormation 44</p> <p>CloudFormation Stacks 46</p> <p>CloudFormation Template Anatomy 47</p> <p>CloudFormation Changesets 52</p> <p>Getting Stack Information 55</p> <p>Cleaning Up Again 57</p> <p>CloudFormation Conclusions 58</p> <p>Terraform 58</p> <p>Coding Style 58</p> <p>Modularity 59</p> <p>Limitations 59</p> <p>Terraform vs. CloudFormation 60</p> <p>Infrastructure- as- Code: CDK, Pulumi, Cloudcraft, and Other Solutions 60</p> <p>AWS CDK 60</p> <p>Pulumi 62</p> <p>Cloudcraft 62</p> <p>Infrastructure Management Conclusions 63</p> <p><b>Chapter 4 Serverless Computing and Data Engineering 65</b></p> <p>Serverless vs. Fully Managed 65</p> <p>AWS Serverless Technologies 66</p> <p>AWS Lambda 67</p> <p>Pricing Model 67</p> <p>Laser Focus on Code 68</p> <p>The Lambda Paradigm Shift 69</p> <p>Virtually Infinite Scalability 70</p> <p>Geographical Distribution 70</p> <p>A Lambda Hello World 71</p> <p>Lambda Configuration 74</p> <p>Runtime 74</p> <p>Container- Based Lambdas 75</p> <p>Architectures 75</p> <p>Memory 75</p> <p>Networking 76</p> <p>Execution Role 76</p> <p>Environment Variables 76</p> <p>AWS EventBridge 77</p> <p>AWS Fargate 77</p> <p>AWS DynamoDB 77</p> <p>AWS SNS 77</p> <p>Amazon SQS 78</p> <p>AWS CloudWatch 78</p> <p>Amazon QuickSight 78</p> <p>AWS Step Functions 78</p> <p>Amazon API Gateway 79</p> <p>Amazon Cognito 79</p> <p>AWS Serverless Application Model (SAM) 79</p> <p>Ephemeral Infrastructure 80</p> <p>AWS SAM Installation 80</p> <p>Configuration 80</p> <p>Creating Your First AWS SAM Project 81</p> <p>Application Structure 83</p> <p>SAM Resource Types 85</p> <p>SAM Lambda Template 86</p> <p>!! Recursive Lambda Invocation !! 88</p> <p>Function Metadata 88</p> <p>Outputs 89</p> <p>Implicitly Generated Resources 89</p> <p>Other Template Sections 90</p> <p>Lambda Code 90</p> <p>Building Your First SAM Application 93</p> <p>Testing the AWS SAM Application Locally 96</p> <p>Deployment 99</p> <p>Cleaning Up 104</p> <p>Summary 104</p> <p><b>Chapter 5 Data Ingestion 105</b></p> <p>AWS Data Lake Architecture 106</p> <p>Serverless Data Lake Architecture Structure 106</p> <p>Ingestion 106</p> <p>Storage and Processing 108</p> <p>Cataloging, Governance, and Search 108</p> <p>Security and Monitoring 109</p> <p>Consumption 109</p> <p>Sample Processing Architecture: Cataloging Images into DynamoDB 109</p> <p>Use Case Description 109</p> <p>SAM Application Creation 110</p> <p>S3- Triggered Lambda 111</p> <p>Adding DynamoDB 119</p> <p>Lambda Execution Context 121</p> <p>Inserting into DynamoDB 121</p> <p>Cleaning Up 123</p> <p>Serverless Ingestion 124</p> <p>AWS Fargate 124</p> <p>AWS Lambda 124</p> <p>Example Architecture: Fargate- Based Periodic Batch Import 125</p> <p>The Basic Importer 125</p> <p>ECS CLI 128</p> <p>AWS Copilot cli 128</p> <p>Clean Up 136</p> <p>AWS Kinesis Ingestion 136</p> <p>Example Architecture: Two- Pronged Delivery 137</p> <p>Fully Managed Ingestion with AppFlow 146</p> <p>Operational Data Ingestion with Database Migration Service 151</p> <p>DMS Concepts 151</p> <p>DMS Instance 151</p> <p>DMS Endpoints 152</p> <p>DMS Tasks 152</p> <p>Summary of the Workflow 152</p> <p>Common Use of DMS 153</p> <p>Example Architecture: DMS to S3 154</p> <p>DMS Instance 154</p> <p>DMS Endpoints 156</p> <p>DMS Task 162</p> <p>Summary 167</p> <p><b>Chapter 6 Processing Data 169</b></p> <p>Phases of Data Preparation 170</p> <p>What Is ETL? Why Should I Care? 170</p> <p>ETL Job vs. Streaming Job 171</p> <p>Overview of ETL in AWS 172</p> <p>ETL with AWS Glue 172</p> <p>ETL with Lambda Functions 172</p> <p>ETL with Hadoop/EMR 173</p> <p>Other Ways to Perform ETL 173</p> <p>ETL Job Design Concepts 173</p> <p>Source Identification 174</p> <p>Destination Identification 174</p> <p>Mappings 174</p> <p>Validation 174</p> <p>Filter 175</p> <p>Join, Denormalization, Relationalization 175</p> <p>AWS Glue for ETL 176</p> <p>Really, It’s Just Spark 176</p> <p>Visual 176</p> <p>Spark Script Editor 177</p> <p>Python Shell Script Editor 177</p> <p>Jupyter Notebook 177</p> <p>Connectors 177</p> <p>Creating Connections 178</p> <p>Creating Connections with the Web Console 178</p> <p>Creating Connections with the AWS cli 179</p> <p>Creating ETL Jobs with AWS Glue Visual Editor 184</p> <p>ETL Example: Format Switch from Raw (JSON) to Cleaned (Parquet) 184</p> <p>Job Bookmarks 187</p> <p>Transformations 188</p> <p>Apply Mapping 189</p> <p>Filter 189</p> <p>Other Available Transforms 190</p> <p>Run the Edited Job 191</p> <p>Visual Editor with Source and Target Conclusions 192</p> <p>Creating ETL Jobs with AWS Glue Visual Editor (without Source and Target) 192</p> <p>Creating ETL Jobs with the Spark Script Editor 192</p> <p>Developing ETL Jobs with AWS Glue Notebooks 193</p> <p>What Is a Notebook? 194</p> <p>Notebook Structure 194</p> <p>Step 1: Load Code into a DynamicFrame 196</p> <p>Step 2: Apply Field Mapping 197</p> <p>Step 3: Apply the Filter 197</p> <p>Step 4: Write to S3 in Parquet Format 198</p> <p>Example: Joining and Denormalizing Data from Two S3 Locations 199</p> <p>Conclusions for Manually Authored Jobs with Notebooks 203</p> <p>Creating ETL Jobs with AWS Glue Interactive Sessions 204</p> <p>It’s Magic 205</p> <p>Development Workflow 206</p> <p>Streaming Jobs 207</p> <p>Differences with a Standard ETL Job 208</p> <p>Streaming Sources 208</p> <p>Example: Process Kinesis Streams with a Streaming Job 208</p> <p>Streaming ETL Jobs Conclusions 217</p> <p>Summary 217</p> <p><b>Chapter 7 Cataloging, Governance, and Search 219</b></p> <p>Cataloging with AWS Glue 219</p> <p>AWS Glue and the AWS Glue Data Catalog 219</p> <p>Glue Databases and Tables 220</p> <p>Databases 220</p> <p>The Idea of Schema- on- Read 221</p> <p>Tables 222</p> <p>Create Table Manually 223</p> <p>Creating a Table from an Existing Schema 225</p> <p>Creating a Table with a Crawler 225</p> <p>Summary on Databases and Tables 226</p> <p>Crawlers 226</p> <p>Updating or Not Updating? 230</p> <p>Running the Crawler 231</p> <p>Creating a Crawler from the AWS CLI 231</p> <p>Retrieving Table Information from the CLI 233</p> <p>Classifiers 235</p> <p>Classifier Example 236</p> <p>Crawlers and Classifiers Summary 237</p> <p>Search with Amazon Athena: The Heart of Analytics in AWS 238</p> <p>A Bit of History 238</p> <p>Interface Overview 238</p> <p>Creating Tables Manually 239</p> <p>Athena Data Types 240</p> <p>Complex Types 241</p> <p>Running a Query 242</p> <p>Connecting with JDBC and ODBC 243</p> <p>Query Stats 243</p> <p>Recent Queries and Saved Queries 243</p> <p>The Power of Partitions 244</p> <p>Athena Pricing Model 244</p> <p>Automatic Naming 245</p> <p>Athena Query Output 246</p> <p>Athena Peculiarities (SQL and Not) 246</p> <p>Computed Fields Gotcha and WITH Statement Workaround 246</p> <p>Lowercase! 247</p> <p>Query Explain 248</p> <p>Deduplicating Records 249</p> <p>Working with JSON, Flattening, and Unnesting 250</p> <p>Athena Views 251</p> <p>Create Table as Select (CTAS) 252</p> <p>Saving Queries and Reusing Saved Queries 253</p> <p>Running Parameterized Queries 254</p> <p>Athena Federated Queries 254</p> <p>Athena Lambda Connectors 255</p> <p>Note on Connection Errors 256</p> <p>Performing Federated Queries 257</p> <p>Creating a View from a Federated Query 258</p> <p>Governing: Athena Workgroups, Lake Formation, and More 258</p> <p>Athena Workgroups 259</p> <p>Fine- Grained Athena Access with IAM 262</p> <p>Recap of Athena- Based Governance 264</p> <p>AWS Lake Formation 265</p> <p>Registering a Location in Lake Formation 266</p> <p>Creating a Database in Lake Formation 268</p> <p>Assigning Permissions in Lake Formation 269</p> <p>LF- Tags and Permissions in Lake Formation 271</p> <p>Data Filters 277</p> <p>Governance Conclusions 279</p> <p>Summary 280</p> <p><b>Chapter 8 Data Consumption: BI, Visualization, and Reporting 283</b></p> <p>QuickSight 283</p> <p>Signing Up for QuickSight 284</p> <p>Standard Plan 284</p> <p>Enterprise Plan 284</p> <p>Users and User Groups 285</p> <p>Managing Users and Groups 285</p> <p>Managing QuickSight 286</p> <p>Users and Groups 287</p> <p>Your Subscriptions 287</p> <p>SPICE Capacity 287</p> <p>Account Settings 287</p> <p>Security and Permissions 287</p> <p>VPC Connections 288</p> <p>Mobile Settings 289</p> <p>Domains and Embedding 289</p> <p>Single Sign- On 289</p> <p>Data Sources and Datasets 289</p> <p>Creating an Athena Data Source 291</p> <p>Creating Other Data Sources 292</p> <p>Creating a Data Source from the AWS cli 292</p> <p>Creating a Dataset from a Table 294</p> <p>Creating a Dataset from a SQL Query 295</p> <p>Duplicating Datasets 296</p> <p>Note on Creating Datasets 297</p> <p>QuickSight Favorites, Recent, and Folders 297</p> <p>SPICE 298</p> <p>Manage SPICE Capacity 298</p> <p>Refresh Schedule 299</p> <p>QuickSight Data Editor 299</p> <p>QuickSight Data Types 302</p> <p>Change Data Types 302</p> <p>Calculated Fields 303</p> <p>Joining Data 305</p> <p>Excluding Fields 309</p> <p>Filtering Data 309</p> <p>Removing Data 310</p> <p>Geospatial Hierarchies and Adding Fields to Hierarchies 310</p> <p>Unsupported Format Dates 311</p> <p>Visualizing Data: QuickSight Analysis 312</p> <p>Adding a Title and a Description to Your Analysis 313</p> <p>Renaming the Sheet 314</p> <p>Your First Visual with AutoGraph 314</p> <p>Field Wells 314</p> <p>Visuals Types 315</p> <p>Saving and Autosaving 316</p> <p>A First Example: Pie Chart 316</p> <p>Renaming a Visual 317</p> <p>Filtering Data 318</p> <p>Adding Drill- Downs 320</p> <p>Parameters 321</p> <p>Actions 324</p> <p>Insights 328</p> <p>ML- Powered Insights 330</p> <p>Sharing an Analysis 335</p> <p>Dashboards 335</p> <p>Dashboard Layouts and Themes 335</p> <p>Publishing a Dashboard 336</p> <p>Embedding Visuals and Dashboards 337</p> <p>Data Consumption: Not Only Dashboards 337</p> <p>Summary 338</p> <p><b>Chapter 9 Machine Learning at Scale 339</b></p> <p>Machine Learning and Artificial Intelligence 339</p> <p>What Are ML/AI Use Cases? 340</p> <p>Types of ML Models 340</p> <p>Overview of ML/AI AWS Solutions 341</p> <p>Amazon SageMaker 341</p> <p>SageMaker Domains 342</p> <p>Adding a User to the Domain 344</p> <p>SageMaker Studio 344</p> <p>SageMaker Example Notebook 346</p> <p>Step 1: Prerequisites and Preprocessing 346</p> <p>Step 2: Data Ingestion 347</p> <p>Step 3: Data Inspection 348</p> <p>Step 4: Data Conversion 349</p> <p>Step 5: Upload Training Data 349</p> <p>Step 6: Train the Model 349</p> <p>Step 7: Set Up Hosting and Deploy the Model 351</p> <p>Step 8: Validate the Model 352</p> <p>Step 9: Use the Model 353</p> <p>Inference 353</p> <p>Real Time 354</p> <p>Asynchronous 354</p> <p>Serverless 354</p> <p>Batch Transform 354</p> <p>Data Wrangler 356</p> <p>SageMaker Canvas 357</p> <p>Summary 358</p> <p><b>Appendix Example Data Architectures in AWS 359</b></p> <p>Modern Data Lake Architecture 360</p> <p>ETL in a Lake House 361</p> <p>Consuming Data in the Lake House 361</p> <p>The Modern Data Lake Architecture 362</p> <p>Batch Processing 362</p> <p>Stream Processing 363</p> <p>Architecture Design Recommendations 364</p> <p>Automate Everything 365</p> <p>Build on Events 365</p> <p>Performance = Cost Savings 365</p> <p>AWS Glue Catalog and Athena- Centric Workflow 365</p> <p>Design Flexible 365</p> <p>Pick Your Battles 365</p> <p>Parquet 366</p> <p>Summary 366</p> <p>Index 367</p>
<p><b>GIONATA “JOE” MINICHINO</b> is Principal Software Engineer and Data Architect on the Data & Analytics Team at Teamwork. He specializes in cloud computing, machine/deep learning, and artificial intelligence and designs end-to-end Amazon Web Services pipelines that move large quantities of diverse data for analysis and visualization.
<p><b>Accessible and hands-on guidance for data analytics solutions built on the AWS cloud</b> <p>In <i>Data Analytics in the AWS Cloud: Building a Data Platform for BI and Predictive Analytics on AWS</i>, veteran data architect and software engineer Joe Minichino delivers an insightful and practical blueprint to storing, processing, and analyzing data on the Amazon Web Services cloud platform. The author explains every relevant aspect of AWS data analytics—from data engineering to analysis, business intelligence, DevOps, MLOps, and more—as he walks you through how to integrate machine learning predictions with analytics engines and data visualization tools. <p>The book includes real-world case studies of businesses using AWS architectures to apply cutting-edge data analytics and offers comprehensive coverage of data acquisition, importation, storage, visualization, and reporting in the Amazon cloud environment. It also discusses expert insights into serverless data engineering, showing you how to use it to reduce overhead and costs, simplify maintenance, and improve stability across the board. <p>An essential resource for data analysts, architects, and engineers, <i>Data Analytics in the AWS Cloud</i> will also benefit a wide variety of technical professionals and business leaders who seek a fuller understanding of how to use Amazon Web Services to enable and deploy superior data analytics solutions.

Diese Produkte könnten Sie auch interessieren:

Data Mining and Machine Learning Applications
Data Mining and Machine Learning Applications
von: Rohit Raja, Kapil Kumar Nagwanshi, Sandeep Kumar, K. Ramya Laxmi
EPUB ebook
190,99 €
Data Mining and Machine Learning Applications
Data Mining and Machine Learning Applications
von: Rohit Raja, Kapil Kumar Nagwanshi, Sandeep Kumar, K. Ramya Laxmi
PDF ebook
190,99 €
Artificial Intelligence for Renewable Energy Systems
Artificial Intelligence for Renewable Energy Systems
von: Ajay Kumar Vyas, S. Balamurugan, Kamal Kant Hiran, Harsh S. Dhiman
EPUB ebook
164,99 €