Details

Algorithms in Bioinformatics


Algorithms in Bioinformatics

Theory and Implementation
1. Aufl.

von: Paul A. Gagniuc

103,99 €

Verlag: Wiley
Format: PDF
Veröffentl.: 13.07.2021
ISBN/EAN: 9781119697954
Sprache: englisch
Anzahl Seiten: 528

DRM-geschütztes eBook, Sie benötigen z.B. Adobe Digital Editions und eine Adobe ID zum Lesen.

Beschreibungen

<b>ALGORITHMS IN BIOINFORMATICS</b> <p><b>Explore a comprehensive and insightful treatment of the practical application of bioinformatic algorithms in a variety of fields</b> <p><i>Algorithms in Bioinformatics: Theory and Implementation</i> delivers a fulsome treatment of some of the main algorithms used to explain biological functions and relationships. It introduces readers to the art of algorithms in a practical manner which is linked with biological theory and interpretation. The book covers many key areas of bioinformatics, including<i> global</i> and <i>local sequence alignment, forced alignment</i>, detection of motifs, <i>Sequence logos, Markov chains</i> or information entropy. Other novel approaches are also described, such as <i>Self-Sequence alignment, Objective Digital Stains</i> (ODSs) or <i>Spectral Forecast</i> and the <i>Discrete Probability Detector</i> (DPD) algorithm. <p>The text incorporates graphical illustrations to highlight and emphasize the technical details of computational algorithms found within, to further the reader’s understanding and retention of the material. Throughout, the book is written in an accessible and practical manner, showing how algorithms can be implemented and used in JavaScript on Internet Browsers. The author has included more than 120 open-source implementations of the material, as well as 33 ready-to-use presentations. The book contains original material that has been class-tested by the author and numerous cases are examined in a biological and medical context. Readers will also benefit from the inclusion of: <ul><li>A thorough introduction to biological evolution, including the emergence of life, classifications and some known theories and molecular mechanisms</li><li>A detailed presentation of new methods, such as <i>Self-sequence</i> alignment, <i>Objective Digital Stains</i> and <i>Spectral Forecast</i></li><li>A treatment of sequence alignment, including local sequence alignment, global sequence alignment and forced sequence alignment with full implementations</li><li>Discussions of position-specific weight matrices, including the count, weight, relative frequencies, and log-likelihoods matrices</li><li>A detailed presentation of the methods related to Markov Chains as well as a description of their implementation in Bioinformatics and adjacent fields</li><li>An examination of information and entropy, including sequence logos and explanations related to their meaning</li><li>An exploration of the current state of bioinformatics, including what is known and what issues are usually avoided in the field</li><li>A chapter on philosophical transactions that allows the reader a broader view of the prediction process</li><li>Native computer implementations in the context of the field of Bioinformatics</li><li>Extensive worked examples with detailed case studies that point out the meaning of different results</li></ul> <p>Perfect for professionals and researchers in biology, medicine, engineering, and information technology, as well as upper level undergraduate students in these fields, <i>Algorithms in Bioinformatics: Theory and Implementation</i> will also earn a place in the libraries of software engineers who wish to understand how to implement bioinformatic algorithms in their products.
<p><b>Preface   xv</b></p> <p><b>About the Companion Website xvii</b></p> <p><b>1              The Tree of Life (I)  1</b></p> <p>1.1          Introduction 1</p> <p>1.2          Emergence of Life 1</p> <p>1.2.1      Timeline Disagreements 3</p> <p>1.3          Classifications and Mechanisms 4</p> <p>1.4          Chromatin Structure 5</p> <p>1.5          Molecular Mechanisms 9</p> <p>1.5.1      Precursor Messenger RNA 9</p> <p>1.5.2      Precursor Messenger RNA to Messenger RNA 10</p> <p>1.5.3      Classes of Introns 10</p> <p>1.5.4      Messenger RNA 10</p> <p>1.5.5      mRNA to Proteins 11</p> <p>1.5.6      Transfer RNA 12</p> <p>1.5.7      Small RNA 12</p> <p>1.5.8      The Transcriptome 13</p> <p>1.5.9      Gene Networks and Information Processing 13</p> <p>1.5.10    Eukaryotic vs. Prokaryotic Regulation 14</p> <p>1.5.11    What Is Life? 14</p> <p>1.6          Known Species 14</p> <p>1.7          Approaches for Compartmentalization 15</p> <p>1.7.1      Two Main Approaches for Organism Formation 16</p> <p>1.7.2      Size and Metabolism 16</p> <p>1.8          Sizes in Eukaryotes 16</p> <p>1.8.1      Sizes in Unicellular Eukaryotes 17</p> <p>1.8.2      Sizes in Multicellular Eukaryotes 17</p> <p>1.9          Sizes in Prokaryotes 17</p> <p>1.10        Virus Sizes 18</p> <p>1.10.1    Viruses vs. the Spark of Metabolism 20</p> <p>1.11        The Diffusion Coefficient 20</p> <p>1.12        The Origins of Eukaryotic Cells 21</p> <p>1.12.1    Endosymbiosis Theory 21</p> <p>1.12.2    DNA and Organelles 22</p> <p>1.12.3    Membrane-bound Organelles with DNA 23</p> <p>1.12.4    Membrane-bound Organelles Without DNA 23</p> <p>1.12.5    Control and Division of Organelles 24</p> <p>1.12.6    The Horizontal Gene Transfer 24</p> <p>1.12.7    On the Mechanisms of Horizontal Gene Transfer 25</p> <p>1.13        Origins of Eukaryotic Multicellularity 26</p> <p>1.13.1    Colonies Inside an Early Unicellular Common Ancestor 26</p> <p>1.13.2    Colonies of Early Unicellular Common Ancestors 26</p> <p>1.13.3    Colonies of Inseparable Early Unicellular Common Ancestors</p> <p>1.13.4    Chimerism and Mosaicism 28</p> <p>1.14        Conclusions 29</p> <p><b>2              Tree of Life: Genomes (II)   31</b></p> <p>2.1          Introduction 31</p> <p>2.2          Rules of Engagement 31</p> <p>2.3          Genome Sizes in the Tree of Life 32</p> <p>2.3.1      Alternative Methods 33</p> <p>2.3.2      The Weaving of Scales 33</p> <p>2.3.3      Computations on the Average Genome Size 36</p> <p>2.3.4      Observations on Data 38</p> <p>2.4          Organellar Genomes 40</p> <p>2.4.1      Chloroplasts 40</p> <p>2.4.2      Apicoplasts 40</p> <p>2.4.3      Chromatophores 42</p> <p>2.4.4      Cyanelles 42</p> <p>2.4.5      Kinetoplasts 42</p> <p>2.4.6      Mitochondria 43</p> <p>2.5          Plasmids 43</p> <p>2.6          Virus Genomes 44</p> <p>2.7          Viroids and Their Implications 46</p> <p>2.8          Genes vs. Proteins in the Tree of Life 47</p> <p>2.9          Conclusions 49</p> <p><b>3              Sequence Alignment (I)   51</b></p> <p>3.1          Introduction 51</p> <p>3.2          Style and Visualization 51</p> <p>3.3          Initialization of the Score Matrix 54</p> <p>3.4          Calculation of Scores 57</p> <p>3.4.1      Initialization of the Score Matrix for Global Alignment 57</p> <p>3.4.2      Initialization of the Score Matrix for Local Alignment 62</p> <p>3.4.3      Optimization of the Initialization Steps 65</p> <p>3.4.4      Curiosities 66</p> <p>3.5          Traceback   71</p> <p>3.6          Global Alignment 75</p> <p>3.7          Local Alignment 79</p> <p>3.8          Alignment Layout 84</p> <p>3.9          Local Sequence Alignment – The Final Version 87</p> <p>3.10        Complementarity 91</p> <p>3.11        Conclusions 97</p> <p><b>4              Forced Alignment (II)   99</b></p> <p>4.1          Introduction 99</p> <p>4.2          Global and Local Sequence Alignment 100</p> <p>4.2.1      Short Notes 100</p> <p>4.2.2      Understanding the Technology 101</p> <p>4.2.3      Main Objectives 102</p> <p>4.3          Experiments and Discussions 102</p> <p>4.3.1      Alignment Layout 106</p> <p>4.3.2      Forced Alignment Regime 106</p> <p>4.3.3      Alignment Scores and Significance 109</p> <p>4.3.4      Optimal Alignments 110</p> <p>4.3.5      The Main Significance Scores 110</p> <p>4.3.6      The Information Content 110</p> <p>4.3.7      The Match Percentage 112</p> <p>4.3.8      Significance vs. Chance 113</p> <p>4.3.9      The Importance of Randomness 113</p> <p>4.3.10    Sequence Quality and the Score Matrix 114</p> <p>4.3.11    The Significance Threshold 115</p> <p>4.3.12    Optimal Alignments by Numbers 116</p> <p>4.3.13    Chaos Theory on Sequence Alignment 116</p> <p>4.3.14    Image-Encoding Possibilities 116</p> <p>4.4          Advanced Features and Methods 117</p> <p>4.4.1      Sequence Detector 117</p> <p>4.4.2      Parameters 117</p> <p>4.4.3      Heatmap 118</p> <p>4.4.4      Text Visualization 123</p> <p>4.4.5      Graphics for Manuscript Figures and Didactic Presentations 124</p> <p>4.4.6      Dynamics 124</p> <p>4.4.7      Independence 125</p> <p>4.4.8      Limits 125</p> <p>4.4.9      Local Storage 125</p> <p>4.5          Conclusions 128</p> <p><b>5              Self-Sequence Alignment (I)  129</b></p> <p>5.1          Introduction 129</p> <p>5.2          True Randomness 130</p> <p>5.3          Information and Compression Algorithms 130</p> <p>5.4          White Noise and Biological Sequences 131</p> <p>5.5          The Mathematical Model 131</p> <p>5.5.1      A Concrete Example 132</p> <p>5.5.2      Model Dissection 133</p> <p>5.5.3      Conditions for Maxima and Minima 136</p> <p>5.6          Noise vs. Redundancy 137</p> <p>5.7          Global and Local Information Content 137</p> <p>5.8          Signal Sensitivity 138</p> <p>5.9          Implementation 140</p> <p>5.9.1      Global Self-Sequence Alignment 140</p> <p>5.9.2      Local Self-Sequence Alignment 144</p> <p>5.10        A Complete Scanner for Information Content 147</p> <p>5.11        Conclusions 149</p> <p><b>6              Frequencies and Percentages (II)   151</b></p> <p>6.1          Introduction 151</p> <p>6.2          Base Composition 152</p> <p>6.3          Percentage of Nucleotide Combinations 152</p> <p>6.4          Implementation 153</p> <p>6.5          A Frequency Scanner 156</p> <p>6.6          Examples of Known Significance 158</p> <p>6.7          Observation vs. Expectation 160</p> <p>6.8          A Frequency Scanner with a Threshold 161</p> <p>6.9          Conclusions 163</p> <p><b>7              Objective Digital Stains (III)  165</b></p> <p>7.1          Introduction 165</p> <p>7.2          Information and Frequency 166</p> <p>7.3          The Objective Digital Stain 169</p> <p>7.3.1      A 3D Representation Over a 2D Plane 173</p> <p>7.3.2      ODSs Relative to the Background 177</p> <p>7.4          Interpretation of ODSs 181</p> <p>7.5          The Significance of the Areas in the ODS 183</p> <p>7.6          Discussions 184</p> <p>7.6.1      A Similarity Between Dissimilar Sequences 186</p> <p>7.7          Conclusions 186</p> <p><b>8              Detection of Motifs (I)   187</b></p> <p>8.1          Introduction 187</p> <p>8.2          DNA Motifs 187</p> <p>8.2.1      DNA-binding Proteins vs. Motifs and Degeneracy 188</p> <p>8.2.2      Concrete Examples of DNA Motifs 188</p> <p>8.3          Major Functions of DNA Motifs 191</p> <p>8.3.1      RNA Splicing and DNA Motifs 191</p> <p>8.4          Conclusions 195</p> <p><b>9              Representation of Motifs (II)   197</b></p> <p>9.1          Introduction 197</p> <p>9.2          The Training Data 197</p> <p>9.3          A Visualization Function 198</p> <p>9.4          The Alignment Matrix 200</p> <p>9.5          Alphabet Detection 203</p> <p>9.6          The Position-Specific Scoring Matrix (PSSM) Initialization 206</p> <p>9.7          The Position Frequency Matrix (PFM)  207</p> <p>9.8          The Position Probability Matrix (PPM)  208</p> <p>9.8.1      A Kind of PPM Pseudo-Scanner 209</p> <p>9.9          The Position Weight Matrix (PWM) 212</p> <p>9.10        The Background Model 215</p> <p>9.11        The Consensus Sequence 218</p> <p>9.11.1    The Consensus – Not Necessarily Functional 219</p> <p>9.12        Mutational Intolerance 221</p> <p>9.13        From Motifs to PWMs 222</p> <p>9.14        Pseudo-Counts and Negative Infinity 226</p> <p>9.15        Conclusions 229</p> <p><b>10           The Motif Scanner (III)   231</b></p> <p>10.1        Introduction 231</p> <p>10.2        Looking for Signals 232</p> <p>10.3        A Functional Scanner 235</p> <p>10.4        The Meaning of Scores 239</p> <p>10.4.1    A Score Value Above Zero 239</p> <p>10.4.2    A Score Value Below Zero  241</p> <p>10.4.3    A Score Value of Zero 241</p> <p>10.5        Conclusions 242</p> <p><b>11           Understanding the Parameters (IV)   243</b></p> <p>11.1        Introduction 243</p> <p>11.2        Experimentation 243</p> <p>11.2.1    A Scanner Implementation Based on Pseudo-Counts 244</p> <p>11.2.2    A Scanner Implementation Based on Propagation of Zero Counts 246</p> <p>11.3        Signal Discrimination 249</p> <p>11.4        False-Positive Results 250</p> <p>11.5        Sensitivity Adjustments 251</p> <p>11.6        Beyond Bioinformatics 252</p> <p>11.7        A Scanner That Uses a Known PWM 253</p> <p>11.8        Signal Thresholds 256</p> <p>11.8.1    Implementation and Filter Testing 258</p> <p>11.9        Conclusions 262</p> <p><b>12           Dynamic Backgrounds (V)   263</b></p> <p>12.1        Introduction 263</p> <p>12.2        Toward a Scanner with Two PFMs 263</p> <p>12.2.1    The Implementation of Dynamic PWMs 264</p> <p>12.2.2    Issues and Corrections for Dynamic PWMs 271</p> <p>12.2.3    Solutions for Aberrant Positive Likelihood Values 274</p> <p>12.3        A Scanner with Two PFMs 280</p> <p>12.4        Information and Background Frequencies on Score Values 283</p> <p>12.5        Dynamic Background vs. Null Model 285</p> <p>12.6        Conclusions 285</p> <p><b>13           Markov Chains: The Machine (I)   287</b></p> <p>13.1        Introduction 287</p> <p>13.2        Transition Matrices 287</p> <p>13.3        Discrete Probability Detector 292</p> <p>13.3.1    Alphabet Detection 292</p> <p>13.3.2    Matrix Initialization  293</p> <p>13.3.3    Frequency Detection  295</p> <p>13.3.4    Calculation of Transition Probabilities 297</p> <p>13.3.5    Particularities in Calculating the Transition Probabilities 306</p> <p>13.4        Markov Chains Generators 307</p> <p>13.4.1    The Experiment 308</p> <p>13.4.2    The Implementation 312</p> <p>13.4.3    Simulation of Transition Probabilities 315</p> <p>13.4.4    The Markov machine  315</p> <p>13.4.5    Result Verification  317</p> <p>13.5        Conclusions 318</p> <p><b>14           Markov Chains: Log Likelihood (II)  319</b></p> <p>14.1        Introduction 319</p> <p>14.2        The Log-Likelihood Matrix 319</p> <p>14.2.1    A Log-Likelihood Matrix Based on the Null Model 320</p> <p>14.2.2    A Log-Likelihood Matrix Based on Two Models 322</p> <p>14.3        Interpretation and Use of the Log-Likelihood Matrix  326</p> <p>14.4        Construction of a Markov Scanner 328</p> <p>14.5        A Scanner That Uses a Known LLM 337</p> <p>14.6        The Meaning of Scores 340</p> <p>14.7        Beyond Bioinformatics 344</p> <p>14.8        Conclusions 345</p> <p><b>15           Spectral Forecast (I)   347</b></p> <p>15.1        Introduction 347</p> <p>15.2        The Spectral Forecast Model 347</p> <p>15.3        The Spectral Forecast Equation 349</p> <p>15.4        The Spectral Forecast Inner Workings 350</p> <p>15.4.1    Each Part on a Single Matrix  351</p> <p>15.4.2    Both Parts on a Single Matrix  352</p> <p>15.4.3    Both Parts on Separate Matrices 353</p> <p>15.4.4    Concrete Example 1  354</p> <p>15.4.5    Concrete Example 2  357</p> <p>15.4.6    Concrete Example 3  359</p> <p>15.5        Implementations 360</p> <p>15.5.1    Spectral Forecast for Signals 362</p> <p>15.5.2    What Does the Value of d Mean? 364</p> <p>15.5.3    Spectral Forecast for Matrices 368</p> <p>15.6        The Spectral Forecast Model for Predictions 372</p> <p>15.6.1    The Spectral Forecast Model for Signals 372</p> <p>15.6.2    Experiments on the Similarity Index Values 381</p> <p>15.6.3    The Spectral Forecast Model for Matrices 384</p> <p>15.7        Conclusions 389</p> <p><b>16           Entropy vs. Content (I)   391</b></p> <p>16.1        Introduction 391</p> <p>16.2        Information Entropy 391</p> <p>16.3        Implementation 395</p> <p>16.4        Information Content vs. Information Entropy 400</p> <p>16.4.1    Implementation 403</p> <p>16.4.2    Additional Considerations 409</p> <p>16.5        Conclusions 409</p> <p><b>17           Philosophical Transactions    411</b></p> <p>17.1        Introduction 411</p> <p>17.2        The Frame of Reference 411</p> <p>17.2.1    The Fundamental Layer of Complexity 412</p> <p>17.2.2    On the Complexity of Life 414</p> <p>17.3        Random vs. Pseudo-random 415</p> <p>17.4        Random Numbers and Noise 418</p> <p>17.5        Determinism and Chaos 419</p> <p>17.5.1    Chaos Without Noise 420</p> <p>17.5.2    Chaos with Noise 427</p> <p>17.5.3    Limits of Prediction 430</p> <p>17.5.4    On the Wings of Chaos 431</p> <p>17.6        Free Will and Determinism 431</p> <p>17.6.1    The Greatest Disappointment 432</p> <p>17.6.2    The Most Powerful Processor in Existence 433</p> <p>17.6.3    Certainty vs. Interpretation 435</p> <p>17.6.4    A Wisdom that Applies 436</p> <p>17.7        Conclusions 439</p> <p><b>Appendix A         441</b></p> <p>A.1         Association of Numerical Values with Letters 441</p> <p>A.2         Sorting Values on Columns 443</p> <p>A.3         The Implementation of a Sequence Logo 446</p> <p>A.4         Sequence Logos Based on Maximum Values 451</p> <p>A.5         Using Logarithms to Build Sequence Logos 455</p> <p>A.6         From a Motif Set to a Sequence Logo 459</p> <p><b>References   467</b></p> <p><b>Index   489</b></p>
<p><b>Paul A. Gagniuc, PhD,</b> is an associated Professor of Bioinformatics and a Professor of Programming Languages at University Politehnica of Bucharest in Romania. He obtained his doctorate in Genetics at the University of Bucharest. Dr. Gagniuc is also an Academic Editor at PLoS ONE and a pro-active reviewer for several well-known scientific journals. He has published numerous high-profile scientific articles and is the recipient of several awards for exceptional scientific results.</p>
<p><b>Explore a comprehensive and insightful treatment of the practical application of bioinformatic algorithms in a variety of fields</b></p> <p><i>Algorithms in Bioinformatics: Theory and Implementation</i> delivers a fulsome treatment of some of the main algorithms used to explain biological functions and relationships. It introduces readers to the art of algorithms in a practical manner which is linked with biological theory and interpretation. The book covers many key areas of bioinformatics, including<i> global</i> and <i>local sequence alignment, forced alignment</i>, detection of motifs, <i>Sequence logos, Markov chains</i> or information entropy. Other novel approaches are also described, such as <i>Self-Sequence alignment, Objective Digital Stains</i> (ODSs) or <i>Spectral Forecast</i> and the <i>Discrete Probability Detector</i> (DPD) algorithm. <p>The text incorporates graphical illustrations to highlight and emphasize the technical details of computational algorithms found within, to further the reader’s understanding and retention of the material. Throughout, the book is written in an accessible and practical manner, showing how algorithms can be implemented and used in JavaScript on Internet Browsers. The author has included more than 120 open-source implementations of the material, as well as 33 ready-to-use presentations. The book contains original material that has been class-tested by the author and numerous cases are examined in a biological and medical context. Readers will also benefit from the inclusion of: <ul><li>A thorough introduction to biological evolution, including the emergence of life, classifications and some known theories and molecular mechanisms</li><li>A detailed presentation of new methods, such as <i>Self-sequence</i> alignment, <i>Objective Digital Stains</i> and <i>Spectral Forecast</i></li><li>A treatment of sequence alignment, including local sequence alignment, global sequence alignment and forced sequence alignment with full implementations</li><li>Discussions of position-specific weight matrices, including the count, weight, relative frequencies, and log-likelihoods matrices</li><li>A detailed presentation of the methods related to Markov Chains as well as a description of their implementation in Bioinformatics and adjacent fields</li><li>An examination of information and entropy, including sequence logos and explanations related to their meaning</li><li>An exploration of the current state of bioinformatics, including what is known and what issues are usually avoided in the field</li><li>A chapter on philosophical transactions that allows the reader a broader view of the prediction process</li><li>Native computer implementations in the context of the field of Bioinformatics</li><li>Extensive worked examples with detailed case studies that point out the meaning of different results</li></ul> <p>Perfect for professionals and researchers in biology, medicine, engineering, and information technology, as well as upper level undergraduate students in these fields, <i>Algorithms in Bioinformatics: Theory and Implementation</i> will also earn a place in the libraries of software engineers who wish to understand how to implement bioinformatic algorithms in their products.

Diese Produkte könnten Sie auch interessieren:

Statistics for Microarrays
Statistics for Microarrays
von: Ernst Wit, John McClure
PDF ebook
84,99 €