Details

Multiple Biological Sequence Alignment

Scoring Functions, Algorithms and Evaluation
Wiley Series in Bioinformatics 1. Aufl.

von: Ken Nguyen, Xuan Guo, Yi Pan, Albert Y. Zomaya
97,99 €
Verlag:	Wiley
Format:	PDF
Veröffentl.:	10.06.2016
ISBN/EAN:	9781119272458
Sprache:	englisch
Anzahl Seiten:	256

In den Warenkorb

Als Gutschein

DRM-geschütztes eBook, Sie benötigen z.B. Adobe Digital Editions und eine Adobe ID zum Lesen.

Beschreibungen

Titelbeschreibung

Covers the fundamentals and techniques of multiple biological sequence alignment and analysis, and shows readers how to choose the appropriate sequence analysis tools for their tasks This book describes the traditional and modern approaches in biological sequence alignment and homology search. This book contains 11 chapters, with Chapter 1 providing basic information on biological sequences. Next, Chapter 2 contains fundamentals in pair-wise sequence alignment, while Chapters 3 and 4 examine popular existing quantitative models and practical clustering techniques that have been used in multiple sequence alignment. Chapter 5 describes, characterizes and relates many multiple sequence alignment models. Chapter 6 describes how traditionally phylogenetic trees have been constructed, and available sequence knowledge bases can be used to improve the accuracy of reconstructing phylogeny trees. Chapter 7 covers the latest methods developed to improve the run-time efficiency of multiple sequence alignment. Next, Chapter 8 covers several popular existing multiple sequence alignment server and services, and Chapter 9 examines several multiple sequence alignment techniques that have been developed to handle short sequences (reads) produced by the Next Generation Sequencing technique (NSG). Chapter 10 describes a Bioinformatics application using multiple sequence alignment of short reads or whole genomes as input. Lastly, Chapter 11 provides a review of RNA and protein secondary structure prediction using the evolution information inferred from multiple sequence alignments. • Covers the full spectrum of the field, from alignment algorithms to scoring methods, practical techniques, and alignment tools and their evaluations • Describes theories and developments of scoring functions and scoring matrices •Examines phylogeny estimation and large-scale homology search Multiple Biological Sequence Alignment: Scoring Functions, Algorithms and Applications is a reference for researchers, engineers, graduate and post-graduate students in bioinformatics, and system biology and molecular biologists. Ken Nguyen, PhD, is an associate professor at Clayton State University, GA, USA. He received his PhD, MSc and BSc degrees in computer science all from Georgia State University. His research interests are in databases, parallel and distribute computing and bioinformatics. He was a Molecular Basis of Disease fellow at Georgia State and is the recipient of the highest graduate honor at Georgia State, the William M. Suttles Graduate Fellowship. Xuan Guo, PhD, is a postdoctoral associate at Oak Ridge National Lab, USA. He received his PhD degree in computer science from Georgia State University in 2015. His research interests are in bioinformatics, machine leaning, and cloud computing. He is an editorial assistant of International Journal of Bioinformatics Research and Applications. Yi Pan, PhD, is a Regents' Professor of Computer Science and an Interim Associate Dean and Chair of Biology at Georgia State University. He received his BE and ME in computer engineering from Tsinghua University in China and his PhD in computer science from the University of Pittsburgh. Dr. Pan's research interests include parallel and distributed computing, optical networks, wireless networks and bioinformatics. He has published more than 180 journal papers with about 60 papers published in various IEEE/ACM journals. He is co-editor along with Albert Y. Zomaya of the Wiley Series in Bioinformatics.

Inhaltsverzeichnis

Preface xi 1 Introduction 1 1.1 Motivation 2 1.2 The Organization of this Book 2 1.3 Sequence Fundamentals 3 1.3.1 Protein 5 1.3.2 DNA/RNA 6 1.3.3 Sequence Formats 6 1.3.4 Motifs 7 1.3.5 Sequence Databases 9 2 Protein/DNA/RNA Pairwise Sequence Alignment 11 2.1 Sequence Alignment Fundamentals 12 2.2 Dot-Plot Matrix 12 2.3 Dynamic Programming 14 2.3.1 Needleman–Wunsch’s Algorithm 15 2.3.2 Example 16 2.3.3 Smith–Waterman’s Algorithm 17 2.3.4 Affine Gap Penalty 19 2.4 Word Method 19 2.4.1 Example 20 2.5 Searching Sequence Databases 21 2.5.1 FASTA 21 2.5.2 BLAST 21 3 Quantifying Sequence Alignments 25 3.1 Evolution and Measuring Evolution 25 3.1.1 Jukes and Cantor’s Model 26 3.1.2 Measuring Relatedness 28 3.2 Substitution Matrices and Scoring Matrices 28 3.2.1 Identity Scores 28 3.2.2 Substitution/Mutation Scores 29 3.3 GAPS 32 3.3.1 Sequence Distances 35 3.3.2 Example 35 3.4 Scoring Multiple Sequence Alignments 36 3.4.1 Sum-of-Pair Score 36 3.5 Circular Sum Score 38 3.6 Conservation Score Schemes 39 3.6.1 Wu and Kabat’s Method 39 3.6.2 Jores’s Method 39 3.6.3 Lockless and Ranganathan’s Method 40 3.7 Diversity Scoring Schemes 40 3.7.1 Background 41 3.7.2 Methods 41 3.8 Stereochemical Property Methods 42 3.8.1 Valdar’s Method 43 3.9 Hierarchical Expected Matching Probability Scoring Metric (HEP) 44 3.9.1 Building an AACCH Scoring Tree 44 3.9.2 The Scoring Metric 46 3.9.3 Proof of Scoring Metric Correctness 47 3.9.4 Examples 48 3.9.5 Scoring Metric and Sequence Weighting Factor 49 3.9.6 Evaluation Data Sets 50 3.9.7 Evaluation Results 52 4 Sequence Clustering 59 4.1 Unweighted Pair Group Method with Arithmetic Mean – UPGMA 60 4.2 Neighborhood-Joining Method – NJ 61 4.3 Overlapping Sequence Clustering 65 5 Multiple Sequences Alignment Algorithms 69 5.1 Dynamic Programming 70 5.1.1 DCA 70 5.2 Progressive Alignment 71 5.2.1 Clustal Family 73 5.2.2 PIMA: Pattern-Induced Multisequence Alignment 73 5.2.3 PRIME: Profile-Based Randomized Iteration Method 74 5.2.4 DIAlign 75 5.3 Consistency and Probabilistic MSA 76 5.3.1 POA: Partial Order Graph Alignment 76 5.3.2 PSAlign 77 5.3.3 ProbCons: Probabilistic Consistency-Based Multiple Sequence Alignment 78 5.3.4 T-Coffee: Tree-Based Consistency Objective Function for Alignment Evaluation 79 5.3.5 MAFFT: MSA Based on Fast Fourier Transform 80 5.3.6 AVID 81 5.3.7 Eulerian Path MSA 81 5.4 Genetic Algorithms 82 5.4.1 SAGA: Sequence Alignment by Genetic Algorithm 83 5.4.2 GA and Self-Organizing Neural Networks 84 5.4.3 FAlign 85 5.5 New Development in Multiple Sequence Alignment Algorithms 85 5.5.1 KB-MSA: Knowledge-Based Multiple Sequence Alignment 85 5.5.2 PADT: Progressive Multiple Sequence Alignment Based on Dynamic Weighted Tree 94 5.6 Test Data and Alignment Methods 97 5.7 Results 98 5.7.1 Measuring Alignment Quality 98 5.7.2 RT-OSM Results 98 6 Phylogeny in Multiple Sequence Alignments 103 6.1 The Tree of Life 103 6.2 Phylogeny Construction 105 6.2.1 Distance Methods 106 6.2.2 Character-Based Methods 107 6.2.3 Maximum Likelihood Methods 109 6.2.4 Bootstrapping 110 6.2.5 Subtree Pruning and Re-grafting 111 6.3 Inferring Phylogeny from Multiple Sequence Alignments 112 7 Multiple Sequence Alignment on High-Performance Computing Models 113 7.1 Parallel Systems 113 7.1.1 Multiprocessor 113 7.1.2 Vector 114 7.1.3 GPU 114 7.1.4 FPGA 114 7.1.5 Reconfigurable Mesh 114 7.2 Exiting Parallel Multiple Sequence Alignment 114 7.3 Reconfigurable-Mesh Computing Models – (R-Mesh) 116 7.4 Pairwise Dynamic Programming Algorithms 118 7.4.1 R-Mesh Max Switches 118 7.4.2 R-Mesh Adder/Subtractor 118 7.4.3 Constant-Time Dynamic Programming on R-Mesh 120 7.4.4 Affine Gap Cost 123 7.4.5 R-Mesh On/Off Switches 124 7.4.6 Dynamic Programming Backtracking on R-Mesh 125 7.5 Progressive Multiple Sequence Alignment ON R-Mesh 126 7.5.1 Hierarchical Clustering on R-Mesh 127 7.5.2 Constant Run-Time Sum-of-Pair Scoring Method 128 7.5.3 Parallel Progressive MSA Algorithm and Its Complexity Analysis 129 8 Sequence Analysis Services 133 8.1 EMBL-EBI: European Bioinformatics Institute 133 8.2 NCBI: National Center for Biotechnology Information 135 8.3 GenomeNet and Data Bank of Japan 136 8.4 Other Sequence Analysis and Alignment Web Servers 137 8.5 SeqAna: Multiple Sequence Alignment with Quality Ranking 138 8.6 Pairwise Sequence Alignment and Other Analysis Tools 140 8.7 Tool Evaluation 142 9 Multiple Sequence for Next-Generation Sequences 145 9.1 Introduction 145 9.2 Overview of Next Generation Sequence Alignment Algorithms 147 9.2.1 Alignment Algorithms Based on Seeding and Hash Tables 147 9.2.2 Alignment Algorithms Based on Suffix Tries 151 9.3 Next-Generation Sequencing Tools 154 10 Multiple Sequence Alignment for Variations Detection 161 10.1 Introduction 161 10.2 Genetic Variants 163 10.3 Variation Detection Methods Based on MSA 165 10.4 Evaluation Methodology 172 10.4.1 Performance Metrics 172 10.4.2 Simulated Sequence Data 174 10.4.3 Real Sequence Data 175 10.5 Conclusion and Future Work 176 11 Multiple Sequence Alignment for Structure Detection 179 11.1 Introduction 179 11.2 RNA Secondary Structure Prediction Based on MSA 180 11.2.1 Common Information in Multiple Aligned RNA Sequences 182 11.2.2 Review of RNA SS Prediction Methods 183 11.2.3 Measures of Quality of RNA SS Prediction 187 11.3 Protein Secondary Structure Prediction Based on MSA 189 11.3.1 Review of Protein Secondary Structure Prediction Methods 190 11.3.2 Measures of Quality of Protein SS Prediction 195 11.4 Conclusion and Future Work 196 References 199 Index 219

Rezension

"Covers the full spectrum of the field, from alignment algorithms to scoring methods, practical techniques, and alignment tools and their evaluations." (Zentralblatt MATH, 2016)

Autorenportrait

Ken Nguyen, PhD, is an associate professor at Clayton State University, GA, USA. He received his PhD, MSc and BSc degrees in computer science all from Georgia State University. His research interests are in databases, parallel and distribute computing and bioinformatics. He was a Molecular Basis of Disease fellow at Georgia State and is the recipient of the highest graduate honor at Georgia State, the William M. Suttles Graduate Fellowship. Xuan Guo, PhD, is a postdoctoral associate at Oak Ridge National Lab, USA. He received his PhD degree in computer science from Georgia State University in 2015. His research interests are in bioinformatics, machine leaning, and cloud computing. He is an editorial assistant of International Journal of Bioinformatics Research and Applications. Yi Pan, PhD, is a Regents' Professor of Computer Science and an Interim Associate Dean and Chair of Biology at Georgia State University. He received his BE and ME in computer engineering from Tsinghua University in China and his PhD in computer science from the University of Pittsburgh. Dr. Pan's research interests include parallel and distributed computing, optical networks, wireless networks and bioinformatics. He has published more than 180 journal papers with about 60 papers published in various IEEE/ACM journals. He is co-editor along with Albert Y. Zomaya of the Wiley Series in Bioinformatics.

Back cover copy

Covers the fundamentals and techniques of multiple biological sequence alignment and analysis, and shows readers how to choose the appropriate sequence analysis tools for their tasks This book describes the traditional and modern approaches in biological sequence alignment and homology search. This book contains 11 chapters, with Chapter 1 providing basic information on biological sequences. Next, Chapter 2 contains fundamentals in pair-wise sequence alignment, while Chapters 3 and 4 examine popular existing quantitative models and practical clustering techniques that have been used in multiple sequence alignment. Chapter 5 describes, characterizes and relates many multiple sequence alignment models. Chapter 6 describes how traditionally phylogenetic trees have been constructed, and available sequence knowledge bases can be used to improve the accuracy of reconstructing phylogeny trees. Chapter 7 covers the latest methods developed to improve the run-time efficiency of multiple sequence alignment. Next, Chapter 8 covers several popular existing multiple sequence alignment server and services, and Chapter 9 examines several multiple sequence alignment techniques that have been developed to handle short sequences (reads) produced by the Next Generation Sequencing technique (NSG). Chapter 10 describes a Bioinformatics application using multiple sequence alignment of short reads or whole genomes as input. Lastly, Chapter 11 provides a review of RNA and protein secondary structure prediction using the evolution information inferred from multiple sequence alignments. <ul> <li>Covers the full spectrum of the field, from alignment algorithms to scoring methods, practical techniques, and alignment tools and their evaluations</li> <li>Describes theories and developments of scoring functions and scoring matrices</li> <li>Examines phylogeny estimation and large-scale homology search</li> </ul> Multiple Biological Sequence Alignment: Scoring Functions, Algorithms and Applications is a reference for researchers, engineers, graduate and post-graduate students in bioinformatics, and system biology and molecular biologists. Ken Nguyen, PhD, is an associate professor at Clayton State University, GA, USA. He received his PhD, MSc and BSc degrees in computer science all from Georgia State University. His research interests are in databases, parallel and distribute computing and bioinformatics. He was a Molecular Basis of Disease fellow at Georgia State and is the recipient of the highest graduate honor at Georgia State, the William M. Suttles Graduate Fellowship. Xuan Guo, PhD, is a postdoctoral associate at Oak Ridge National Lab, USA. He received his PhD degree in computer science from Georgia State University in 2015. His research interests are in bioinformatics, machine leaning, and cloud computing. He is an editorial assistant of International Journal of Bioinformatics Research and Applications. Yi Pan, PhD, is a Regents' Professor of Computer Science and an Interim Associate Dean and Chair of Biology at Georgia State University. He received his BE and ME in computer engineering from Tsinghua University in China and his PhD in computer science from the University of Pittsburgh. Dr. Pan's research interests include parallel and distributed computing, optical networks, wireless networks and bioinformatics. He has published more than 180 journal papers with about 60 papers published in various IEEE/ACM journals. He is co-editor along with Albert Y. Zomaya of the Wiley Series in Bioinformatics.