Details

Data Engineering and Data Science


Data Engineering and Data Science

Concepts and Applications
1. Aufl.

von: Kukatlapalli Pradeep Kumar, Aynur Unal, Vinay Jha Pillai, Hari Murthy, M. Niranjanamurthy

190,99 €

Verlag: Wiley
Format: EPUB
Veröffentl.: 29.08.2023
ISBN/EAN: 9781119841975
Sprache: englisch
Anzahl Seiten: 464

DRM-geschütztes eBook, Sie benötigen z.B. Adobe Digital Editions und eine Adobe ID zum Lesen.

Beschreibungen

<b>DATA ENGINEERING and DATA SCIENCE</b> <p><b>Written and edited by one of the most prolific and well-known experts in the field and his team, this exciting new volume is the “one-stop shop” for the concepts and applications of data science and engineering for data scientists across many industries.</b> <p>The field of data science is incredibly broad, encompassing everything from cleaning data to deploying predictive models. However, it is rare for any single data scientist to be working across the spectrum day to day. Data scientists usually focus on a few areas and are complemented by a team of other scientists and analysts. Data engineering is also a broad field, but any individual data engineer doesn’t need to know the whole spectrum of skills. Data engineering is the aspect of data science that focuses on practical applications of data collection and analysis. For all the work that data scientists do to answer questions using large sets of information, there have to be mechanisms for collecting and validating that information. <p>In this exciting new volume, the team of editors and contributors sketch the broad outlines of data engineering, then walk through more specific descriptions that illustrate specific data engineering roles. Data-driven discovery is revolutionizing the modeling, prediction, and control of complex systems. This book brings together machine learning, engineering mathematics, and mathematical physics to integrate modeling and control of dynamical systems with modern methods in data science. It highlights many of the recent advances in scientific computing that enable data-driven methods to be applied to a diverse range of complex systems, such as turbulence, the brain, climate, epidemiology, finance, robotics, and autonomy. Whether for the veteran engineer or scientist working in the field or laboratory, or the student or academic, this is a must-have for any library.
<p>Preface xv</p> <p><b>1 Quality Assurance in Data Science: Need, Challenges and Focus 1</b><br /><i>Jasmine K.S., Ajay D. K. and Aditya Raj</i></p> <p>1.1 Introduction 1</p> <p>1.2 Testing and Quality Assurance 3</p> <p>1.3 Product Quality and Test Efforts 4</p> <p>1.4 Data Masking in Data Model and Associated Risks 8</p> <p>1.5 Prediction in Data Science 9</p> <p>1.6 Role of Metrics in Evaluation 20</p> <p>1.7 Quantity of Data in Quality Assurance 20</p> <p>1.8 Identifying the Right Data Sources 20</p> <p>1.9 Conclusion 21</p> <p><b>2 Design and Implementation of Social Media Mining -- Knowledge Discovery Methods for Effective Digital Marketing Strategies 23</b><br /><i>Prashant Bhat and Pradnya Malaganve</i></p> <p>2.1 Introduction 24</p> <p>2.2 Literature Review 26</p> <p>2.3 Novel Framework for Social Media Data Mining and Knowledge Discovery 29</p> <p>2.4 Classification for Comparison Analysis 34</p> <p>2.5 Clustering Methodology to Provide Digital Marketing Strategies 38</p> <p>2.6 Experimental Results 43</p> <p>2.7 Conclusion 45</p> <p><b>3 A Study on Big Data Engineering Using Cloud Data Warehouse 49</b><br /><i>Manjunath T. N., Pushpa S. K., Ravindra S. Hegadi and Ananya Hathwar K. S.</i></p> <p>3.1 Introduction 50</p> <p>3.2 Comparison Study of Different Cloud Data Warehouses 51</p> <p>3.3 Snowflake Cloud Data Warehouse 55</p> <p>3.4 Google BigQuery Cloud Data Warehouse 58</p> <p>3.5 Microsoft Azure Synapse Cloud Data Warehouse 61</p> <p>3.6 Informatica Intelligent Cloud Services (IICS) 64</p> <p>3.7 Conclusion 67</p> <p><b>4 Data Mining with Cluster Analysis Through Partitioning Approach of Huge Transaction Data 71</b><br /><i>Sampath Kini K. and Karthik Pai B.H.</i></p> <p>4.1 Introduction 72</p> <p>4.2 Methodology Used in Proposed Cluster Analysis System 75</p> <p>4.3 Literature Survey on Existing Systems 80</p> <p>4.4 Conclusion 82</p> <p><b>5 Application of Data Science in Macromodeling of Nonlinear Dynamical Systems 85</b><br /><i>Nagaraj S., Seshachalam D. and Jayalatha G.</i></p> <p>5.1 Introduction 86</p> <p>5.2 Nonlinear Autonomous Dynamical System 89</p> <p>5.3 Nonlinear System - MOR 90</p> <p>5.4 Data Science Life Cycle 92</p> <p>5.5 Artificial Neural Network in Modeling 94</p> <p>5.6 Neuron Spiking Model Using FitzHugh-Nagumo (F-N) System 99</p> <p>5.7 Ring Oscillator Model 104</p> <p>5.8 Nonlinear VLSI Interconnect Model Using Telegraph Equation 108</p> <p>5.9 Macromodel Using Machine Learning 112</p> <p>5.10 MOR of Dynamical Systems Using POD-ANN 115</p> <p>5.11 Numerical Results 117</p> <p>5.12 Conclusion 126</p> <p><b>6 Comparative Analysis of Various Ensemble Approaches for Web Page Classification 137</b><br /><i>J. Dutta, Yong Woon Kim and Dalia Dominic</i></p> <p>6.1 Introduction 138</p> <p>6.2 Literature Survey 139</p> <p>6.3 Material and Methods 144</p> <p>6.4 Ensemble Classifiers 146</p> <p>6.5 Results 148</p> <p>6.6 Conclusion 169</p> <p><b>7 Feature Engineering and Selection Approach Over Malicious Image 173</b><br /><i>P.M. Kavitha and B. Muruganantham</i></p> <p>7.1 Introduction 173</p> <p>7.2 Feature Engineering Techniques 176</p> <p>7.3 Malicious Feature Engineering 182</p> <p>7.4 Image Processing Technique 183</p> <p>7.5 Image Processing Techniques for Analysis on Malicious Images 185</p> <p>7.6 Conclusion 191</p> <p><b>8 Cubic-Regression and Likelihood Based Boosting GAM to Model Drug Sensitivity for Glioblastoma 195</b><br /><i>Satyawant Kumar, Vinai George Biju, Ho-Kyoung Lee and Blessy Baby Mathew</i></p> <p>8.1 Introduction 196</p> <p>8.2 Literature Survey 198</p> <p>8.3 Materials and Methods 201</p> <p>8.4 Evaluations, Results and Discussions 209</p> <p><b>9 Unobtrusive Engagement Detection through Semantic Pose Estimation and Lightweight ResNet for an Online Class Environment 225</b><br /><i>Michael Moses Thiruthuvanathan, Balachandran Krishnan and Madhavi Rangaswamy</i></p> <p>9.1 Introduction 226</p> <p>9.2 Related Work 230</p> <p>9.3 Proposed Methodology 234</p> <p>9.4 Experimentation 241</p> <p>9.5 Results and Discussions 245</p> <p><b>10 Building Rule Base for Decision Making -- A Fuzzy-Rough Approach 255</b><br /><i>Sabu M. K., Neeraj Krishna M. S. and Reshmi R.</i></p> <p>10.1 Introduction 256</p> <p>10.2 Literature Review 258</p> <p>10.3 Discretization of the Dataset Using Fuzzy Set Theory 260</p> <p>10.4 Description of the Dataset 260</p> <p>10.5 Process Involved in Proposed Work 261</p> <p>10.6 Experiment 262</p> <p>10.7 Evaluation Result 267</p> <p>10.8 Discussion 273</p> <p><b>11 An Effective Machine Learning Approach to Model Healthcare Data 279</b><br /><i>Shaila H. Koppad, S. Anupama Kumar and Mohan Kumar</i></p> <p>11.1 Introduction 280</p> <p>11.2 Types of Data in Healthcare 281</p> <p>11.3 Big Data in Healthcare 283</p> <p>11.4 Different V’s of Big Data 284</p> <p>11.5 About COPD 285</p> <p>11.6 Methodology Implemented 290</p> <p><b>12 Recommendation Engine for Retail Domain Using Machine Learning Techniques 303</b><br /><i>Chandrashekhara K. T., Gireesh Babu C. N. and Thungamani M.</i></p> <p>12.1 Introduction 304</p> <p>12.2 Proposed System 304</p> <p>12.3 Results 312</p> <p>12.3.1 ARIMA Forecasting 312</p> <p>12.4 Conclusion 313</p> <p><b>13 Mining Heterogeneous Lung Cancer from Computer Tomography (CT) Scan with the Confusion Matrix 317</b><br /><i>Denny Dominic and Krishnan Balachandran</i></p> <p>13.1 Introduction 317</p> <p>13.2 Literature Review 319</p> <p>13.3 Methodology 320</p> <p>13.4 Result 326</p> <p>13.5 Conclusion and Future Scope 332</p> <p>References 332</p> <p><b>14 ML Algorithms and Their Approach on COVID-19 Data Analysis 335</b><br /><i>Kambaluru Ashok, Penumalli Anvesh Reddy and Kukatlapalli Pradeep Kumar</i></p> <p>14.1 Introduction 336</p> <p>14.2 DataSet 336</p> <p>14.3 Types of Machine Learning Algorithms 338</p> <p>14.4 Conclusion 348</p> <p><b>15 Analysis and Design for the Early Stage Detection of Lung Diseases Using Machine Learning Algorithms 351</b><br /><i>Sindhu Madhuri, Mahesh T. R., Vivek V., Shashikala H. K. and C. Saravanan</i></p> <p>15.1 Introduction 352</p> <p>15.2 Machine Learning Algorithms 358</p> <p>15.3 Evaluation Metrics and Comparative Results for Early Detection of Lung Diseases 364</p> <p>15.4 Conclusion 369</p> <p><b>16 Estimation of Cancer Risk through Artificial Neural Network 373</b><br /><i>K. Aditya Shastry, Sanjay H. A., Balaji N. and Karthik Pai B. H.</i></p> <p>16.1 Introduction 373</p> <p>16.2 Case Studies Related to Cancer Risk Estimation Using ANN 375</p> <p>16.3 Datasets Used in Cancer Risk Estimation 388</p> <p>16.4 Discussion 397</p> <p>16.5 Future Scope 400</p> <p>16.6 Conclusion 400</p> <p><b>17 Applications and Advancements in Data Science and Analytics 409</b><br /><i>T. Mamatha, A. Balaram, B. Rama Subba Reddy, C. Shoba Bindu and M. Niranjanamurthy</i></p> <p>17.1 Data Science and Analytics in Software Testing 410</p> <p>17.2 Applications of Data Science and Analytics 411</p> <p>17.3 Selenium Testing Tool in Data Science 419</p> <p>17.4 Challenges and Advancements in Data Science 425</p> <p>17.5 Data Science and Analytics Tools 430</p> <p>17.6 Conclusion 438</p> <p>References 439</p> <p>About the Editors 441</p> <p>Index 443</p>
<p><b>Kukatlapalli Pradeep Kumar, PhD,</b> is an associate professor and the Program Coordinator for Data Science at Christ University, Bangalore, India. He has 13 years of research and academic experience. He has published in many journals and presented numerous conference papers. <p><b>Aynur Unal, PhD,</b> educated at Stanford University (class of ’73), has taught at Stanford University for almost 40 years and established the Acoustics Institute. Her work on “New Transform Domains for the Onset of Failures” received a prestigious research award. <p><b>Vinay Jha Pillai, PhD,</b> is an associate professor in the Department of Electronics and Communication Engineering at CHRIST University, Bangalore, India. He has 12 years of academic experience and holds two patents. He has also completed two funded projects as principal investigator. <p><b>Hari Murthy, PhD,</b> is a faculty member in the Department of Electronics and Communication Engineering, CHRIST University, Bengaluru, India. He finished his PhD from the University of Canterbury, New Zealand where his thesis was on novel anticorrosion materials. He has authored book chapters and published papers in international journals and conferences and has served as part of the program committees for several international conferences. <p><b>M. Niranjanamurthy, PhD,</b> is an assistant professor in the Department of Computer Applications, M S Ramaiah Institute of Technology, Bangalore, Karnataka. He earned his PhD in computer science at JJTU, Rajasthan, India. He has over 11 years of teaching experience and two years of industry experience as a software engineer. He has published several books, and he is working on numerous books for Scrivener Publishing. He has published over 60 papers for scholarly journals and conferences, and he is working as a reviewer in 22 scientific journals. He also has numerous awards to his credit.
<p><b>Written and edited by one of the most prolific and well-known experts in the field and his team, this exciting new volume is the “one-stop shop” for the concepts and applications of data science and engineering for data scientists across many industries.</b> <p>The field of data science is incredibly broad, encompassing everything from cleaning data to deploying predictive models. However, it is rare for any single data scientist to be working across the spectrum day to day. Data scientists usually focus on a few areas and are complemented by a team of other scientists and analysts. Data engineering is also a broad field, but any individual data engineer doesn’t need to know the whole spectrum of skills. Data engineering is the aspect of data science that focuses on practical applications of data collection and analysis. For all the work that data scientists do to answer questions using large sets of information, there have to be mechanisms for collecting and validating that information. <p>In this exciting new volume, the team of editors and contributors sketch the broad outlines of data engineering, then walk through more specific descriptions that illustrate specific data engineering roles. Data-driven discovery is revolutionizing the modeling, prediction, and control of complex systems. This book brings together machine learning, engineering mathematics, and mathematical physics to integrate modeling and control of dynamical systems with modern methods in data science. It highlights many of the recent advances in scientific computing that enable data-driven methods to be applied to a diverse range of complex systems, such as turbulence, the brain, climate, epidemiology, finance, robotics, and autonomy. Whether for the veteran engineer or scientist working in the field or laboratory, or the student or academic, this is a must-have for any library.

Diese Produkte könnten Sie auch interessieren:

Statistics for Microarrays
Statistics for Microarrays
von: Ernst Wit, John McClure
PDF ebook
90,99 €