Details

Pattern Recognition in Computational Molecular Biology

Techniques and Approaches
Wiley Series in Bioinformatics 1. Aufl.

von: Mourad Elloumi, Costas Iliopoulos, Jason T. L. Wang, Albert Y. Zomaya, Yi Pan
123,99 €
Verlag:	Wiley
Format:	PDF
Veröffentl.:	30.11.2015
ISBN/EAN:	9781119078852
Sprache:	englisch
Anzahl Seiten:	656

In den Warenkorb

Als Gutschein

DRM-geschütztes eBook, Sie benötigen z.B. Adobe Digital Editions und eine Adobe ID zum Lesen.

Beschreibungen

Titelbeschreibung

A comprehensive overview of high-performance pattern recognition techniques and approaches to Computational Molecular Biology This book surveys the developments of techniques and approaches on pattern recognition related to Computational Molecular Biology. Providing a broad coverage of the field, the authors cover fundamental and technical information on these techniques and approaches, as well as discussing their related problems. The text consists of twenty nine chapters, organized into seven parts: Pattern Recognition in Sequences, Pattern Recognition in Secondary Structures, Pattern Recognition in Tertiary Structures, Pattern Recognition in Quaternary Structures, Pattern Recognition in Microarrays, Pattern Recognition in Phylogenetic Trees, and Pattern Recognition in Biological Networks. <ul> <li>Surveys the development of techniques and approaches on pattern recognition in biomolecular data</li> <li>Discusses pattern recognition in primary, secondary, tertiary and quaternary structures, as well as microarrays, phylogenetic trees and biological networks</li> <li>Includes case studies and examples to further illustrate the concepts discussed in the book</li> </ul> Pattern Recognition in Computational Molecular Biology: Techniques and Approaches is a reference for practitioners and professional researches in Computer Science, Life Science, and Mathematics. This book also serves as a supplementary reading for graduate students and young researches interested in Computational Molecular Biology.

Inhaltsverzeichnis

LIST OF CONTRIBUTORS xxi PREFACE xxvii I PATTERN RECOGNITION IN SEQUENCES 1 1 COMBINATORIAL HAPLOTYPING PROBLEMS 3 Giuseppe Lancia 1.1 Introduction / 3 1.2 Single Individual Haplotyping / 5 1.2.1 The Minimum Error Correction Model / 8 1.2.2 Probabilistic Approaches and Alternative Models / 10 1.3 Population Haplotyping / 12 1.3.1 Clark’s Rule / 14 1.3.2 Pure Parsimony / 15 1.3.3 Perfect Phylogeny / 19 1.3.4 Disease Association / 21 1.3.5 Other Models / 22 References / 23 2 ALGORITHMIC PERSPECTIVES OF THE STRING BARCODING PROBLEMS 28 Sima Behpour and Bhaskar DasGupta 2.1 Introduction / 28 2.2 Summary of Algorithmic Complexity Results for Barcoding Problems / 32 2.2.1 Average Length of Optimal Barcodes / 33 2.3 Entropy-Based Information Content Technique for Designing Approximation Algorithms for String Barcoding Problems / 34 2.4 Techniques for Proving Inapproximability Results for String Barcoding Problems / 36 2.4.1 Reductions from Set Covering Problem / 36 2.4.2 Reduction from Graph-Coloring Problem / 38 2.5 Heuristic Algorithms for String Barcoding Problems / 39 2.5.1 Entropy-Based Method with a Different Measure for Information Content / 39 2.5.2 Balanced Partitioning Approach / 40 2.6 Conclusion / 40 Acknowledgments / 41 References / 41 3 ALIGNMENT-FREE MEASURES FOR WHOLE-GENOME COMPARISON 43 Matteo Comin and Davide Verzotto 3.1 Introduction / 43 3.2 Whole-Genome Sequence Analysis / 44 3.2.1 Background on Whole-Genome Comparison / 44 3.2.2 Alignment-Free Methods / 45 3.2.3 Average Common Subword / 46 3.2.4 Kullback–Leibler Information Divergence / 47 3.3 Underlying Approach / 47 3.3.1 Irredundant Common Subwords / 48 3.3.2 Underlying Subwords / 49 3.3.3 Efficient Computation of Underlying Subwords / 50 3.3.4 Extension to Inversions and Complements / 53 3.3.5 A Distance-Like Measure Based on Underlying Subwords / 53 3.4 Experimental Results / 54 3.4.1 Genome Data sets and Reference Taxonomies / 54 3.4.2 Whole-Genome Phylogeny Reconstruction / 56 3.5 Conclusion / 61 Author’s Contributions / 62 Acknowledgments / 62 References / 62 4 A MAXIMUM LIKELIHOOD FRAMEWORK FOR MULTIPLE SEQUENCE LOCAL ALIGNMENT 65 Chengpeng Bi 4.1 Introduction / 65 4.2 Multiple Sequence Local Alignment / 67 4.2.1 Overall Objective Function / 67 4.2.2 Maximum Likelihood Model / 68 4.3 Motif Finding Algorithms / 70 4.3.1 DEM Motif Algorithm / 70 4.3.2 WEM Motif Finding Algorithm / 70 4.3.3 Metropolis Motif Finding Algorithm / 72 4.3.4 Gibbs Motif Finding Algorithm / 73 4.3.5 Pseudo-Gibbs Motif Finding Algorithm / 74 4.4 Time Complexity / 75 4.5 Case Studies / 75 4.5.1 Performance Evaluation / 76 4.5.2 CRP Binding Sites / 76 4.5.3 Multiple Motifs in Helix–Turn–Helix Protein Structure / 78 4.6 Conclusion / 80 References / 81 5 GLOBAL SEQUENCE ALIGNMENT WITH A BOUNDED NUMBER OF GAPS 83 Carl Barton, Tomáš Flouri, Costas S. Iliopoulos, and Solon P. Pissis 5.1 Introduction / 83 5.2 Definitions and Notation / 85 5.3 Problem Definition / 87 5.4 Algorithms / 88 5.5 Conclusion / 94 References / 95 II PATTERN RECOGNITION IN SECONDARY STRUCTURES 97 6 A SHORT REVIEW ON PROTEIN SECONDARY STRUCTURE PREDICTION METHODS 99 Renxiang Yan, Jiangning Song, Weiwen Cai, and Ziding Zhang 6.1 Introduction / 99 6.2 Representative Protein Secondary Structure Prediction Methods / 102 6.2.1 Chou–Fasman / 103 6.2.2 GOR / 104 6.2.3 PHD / 104 6.2.4 PSIPRED / 104 6.2.5 SPINE-X / 105 6.2.6 PSSpred / 105 6.2.7 Meta Methods / 105 6.3 Evaluation of Protein Secondary Structure Prediction Methods / 106 6.3.1 Measures / 106 6.3.2 Benchmark / 106 6.3.3 Performances / 107 6.4 Conclusion / 110 Acknowledgments / 110 References / 111 7 A GENERIC APPROACH TO BIOLOGICAL SEQUENCE SEGMENTATION PROBLEMS: APPLICATION TO PROTEIN SECONDARY STRUCTURE PREDICTION 114 Yann Guermeur and Fabien Lauer 7.1 Introduction / 114 7.2 Biological Sequence Segmentation / 115 7.3 MSVMpred / 117 7.3.1 Base Classifiers / 117 7.3.2 Ensemble Methods / 118 7.3.3 Convex Combination / 119 7.4 Postprocessing with A Generative Model / 119 7.5 Dedication to Protein Secondary Structure Prediction / 120 7.5.1 Biological Problem / 121 7.5.2 MSVMpred2 / 121 7.5.3 Hidden Semi-Markov Model / 122 7.5.4 Experimental Results / 122 7.6 Conclusions and Ongoing Research / 125 Acknowledgments / 126 References / 126 8 STRUCTURAL MOTIF IDENTIFICATION AND RETRIEVAL: A GEOMETRICAL APPROACH 129 Virginio Cantoni, Marco Ferretti, Mirto Musci, and Nahumi Nugrahaningsih 8.1 Introduction / 129 8.2 A Few Basic Concepts / 130 8.2.1 Hierarchy of Protein Structures / 130 8.2.2 Secondary Structure Elements / 131 8.2.3 Structural Motifs / 132 8.2.4 Available Sources for Protein Data / 134 8.3 State of the Art / 135 8.3.1 Protein Structure Motif Search / 135 8.3.2 Promotif / 136 8.3.3 Secondary-Structure Matching / 137 8.3.4 Multiple Structural Alignment by Secondary Structures / 138 8.4 A Novel Geometrical Approach to Motif Retrieval / 138 8.4.1 Secondary Structures Cooccurrences / 138 8.4.2 Cross Motif Search / 143 8.4.3 Complete Cross Motif Search / 146 8.5 Implementation Notes / 149 8.5.1 Optimizations / 149 8.5.2 Parallel Approaches / 150 8.6 Conclusions and Future Work / 151 Acknowledgment / 152 References / 152 9 GENOME-WIDE SEARCH FOR PSEUDOKNOTTED NONCODING RNAs: A COMPARATIVE STUDY 155 Meghana Vasavada, Kevin Byron, Yang Song, and Jason T.L. Wang 9.1 Introduction / 155 9.2 Background / 156 9.2.1 Noncoding RNAs and Their Secondary Structures / 156 9.2.2 Pseudoknotted ncRNA Search Tools / 157 9.3 Methodology / 157 9.4 Results and Interpretation / 161 9.5 Conclusion / 162 References / 163 III PATTERN RECOGNITION IN TERTIARY STRUCTURES 165 10 MOTIF DISCOVERY IN PROTEIN 3D-STRUCTURES USING GRAPH MINING TECHNIQUES 167 Wajdi Dhifli and Engelbert Mephu Nguifo 10.1 Introduction / 167 10.2 From Protein 3D-Structures to Protein Graphs / 169 10.2.1 Parsing Protein 3D-Structures into Graphs / 169 10.3 Graph Mining / 172 10.4 Subgraph Mining / 173 10.5 Frequent Subgraph Discovery / 173 10.5.1 Problem Definition / 174 10.5.2 Candidates Generation / 176 10.5.3 Frequent Subgraph Discovery Approaches / 177 10.5.4 Variants of Frequent Subgraph Mining: Closed and Maximal Subgraphs / 178 10.6 Feature Selection / 179 10.6.1 Relevance of a Feature / 179 10.7 Feature Selection for Subgraphs / 180 10.7.1 Problem Statement / 180 10.7.2 Mining Top-k Subgraphs / 180 10.7.3 Clustering-Based Subgraph Selection / 181 10.7.4 Sampling-Based Approaches / 181 10.7.5 Approximate Subgraph Mining / 181 10.7.6 Discriminative Subgraph Selection / 182 10.7.7 Other Significant Subgraph Selection Approaches / 182 10.8 Discussion / 183 10.9 Conclusion / 185 Acknowledgments / 185 References / 186 11 FUZZY AND UNCERTAIN LEARNING TECHNIQUES FOR THE ANALYSIS AND PREDICTION OF PROTEIN TERTIARY STRUCTURES 190 Chinua Umoja, Xiaxia Yu, and Robert Harrison 11.1 Introduction / 190 11.2 Genetic Algorithms / 192 11.2.1 GA Model Selection in Protein Structure Prediction / 196 11.2.2 Common Methodology / 198 11.3 Supervised Machine Learning Algorithm / 201 11.3.1 Artificial Neural Networks / 201 11.3.2 ANNs in Protein Structure Prediction / 202 11.3.3 Support Vector Machines / 203 11.4 Fuzzy Application / 204 11.4.1 Fuzzy Logic / 204 11.4.2 Fuzzy SVMs / 204 11.4.3 Adaptive-Network-Based Fuzzy Inference Systems / 205 11.4.4 Fuzzy Decision Trees / 206 11.5 Conclusion / 207 References / 208 12 PROTEIN INTER-DOMAIN LINKER PREDICTION 212 Maad Shatnawi, Paul D. Yoo, and Sami Muhaidat 12.1 Introduction / 212 12.2 Protein Structure Overview / 213 12.3 Technical Challenges and Open Issues / 214 12.4 Prediction Assessment / 215 12.5 Current Approaches / 216 12.5.1 DomCut / 216 12.5.2 Scooby-Domain / 217 12.5.3 FIEFDom / 218 12.5.4 Chatterjee et al. (2009) / 219 12.5.5 Drop / 219 12.6 Domain Boundary Prediction Using Enhanced General Regression Network / 220 12.6.1 Multi-Domain Benchmark Data Set / 220 12.6.2 Compact Domain Profile / 221 12.6.3 The Enhanced Semi-Parametric Model / 222 12.6.4 Training, Testing, and Validation / 225 12.6.5 Experimental Results / 226 12.7 Inter-Domain Linkers Prediction Using Compositional Index and Simulated Annealing / 227 12.7.1 Compositional Index / 228 12.7.2 Detecting the Optimal Set of Threshold Values Using Simulated Annealing / 229 12.7.3 Experimental Results / 230 12.8 Conclusion / 232 References / 233 13 PREDICTION OF PROLINE CIS–TRANS ISOMERIZATION 236 Paul D. Yoo, Maad Shatnawi, Sami Muhaidat, Kamal Taha, and Albert Y. Zomaya 13.1 Introduction / 236 13.2 Methods / 238 13.2.1 Evolutionary Data Set Construction / 238 13.2.2 Protein Secondary Structure Information / 239 13.2.3 Method I: Intelligent Voting / 239 13.2.4 Method II: Randomized Meta-Learning / 241 13.2.5 Model Validation and Testing / 242 13.2.6 Parameter Tuning / 242 13.3 Model Evaluation and Analysis / 243 13.4 Conclusion / 245 References / 245 IV PATTERN RECOGNITION IN QUATERNARY STRUCTURES 249 14 PREDICTION OF PROTEIN QUATERNARY STRUCTURES 251 Akbar Vaseghi, Maryam Faridounnia, Soheila Shokrollahzade, Samad Jahandideh, and Kuo-Chen Chou 14.1 Introduction / 251 14.2 Protein Structure Prediction / 255 14.2.1 Secondary Structure Prediction / 255 14.2.2 Modeling of Tertiary Structure / 256 14.3 Template-Based Predictions / 257 14.3.1 Homology Modeling / 257 14.3.2 Threading Methods / 257 14.3.3 Ab initio Modeling / 257 14.4 Critical Assessment of Protein Structure Prediction / 258 14.5 Quaternary Structure Prediction / 258 14.6 Conclusion / 261 Acknowledgments / 261 References / 261 15 COMPARISON OF PROTEIN QUATERNARY STRUCTURES BY GRAPH APPROACHES 266 Sheng-Lung Peng and Yu-Wei Tsay 15.1 Introduction / 266 15.2 Similarity in the Graph Model / 268 15.2.1 Graph Model for Proteins / 270 15.3 Measuring Structural Similarity VIA MCES / 272 15.3.1 Problem Formulation / 273 15.3.2 Constructing P-Graphs / 274 15.3.3 Constructing Line Graphs / 276 15.3.4 Constructing Modular Graphs / 276 15.3.5 Maximum Clique Detection / 277 15.3.6 Experimental Results / 277 15.4 Protein Comparison VIA Graph Spectra / 279 15.4.1 Graph Spectra / 279 15.4.2 Matrix Selection / 281 15.4.3 Graph Cospectrality and Similarity / 283 15.4.4 Cospectral Comparison / 283 15.4.5 Experimental Results / 284 15.5 Conclusion / 287 References / 287 16 STRUCTURAL DOMAINS IN PREDICTION OF BIOLOGICAL PROTEIN–PROTEIN INTERACTIONS 291 Mina Maleki, Michael Hall, and Luis Rueda 16.1 Introduction / 291 16.2 Structural Domains / 293 16.3 The Prediction Framework / 293 16.4 Feature Extraction and Prediction Properties / 294 16.4.1 Physicochemical Properties / 296 16.4.2 Domain-Based Properties / 298 16.5 Feature Selection / 299 16.5.1 Filter Methods / 299 16.5.2 Wrapper Methods / 301 16.6 Classification / 301 16.6.1 Linear Dimensionality Reduction / 301 16.6.2 Support Vector Machines / 303 16.6.3 k-Nearest Neighbor / 303 16.6.4 Naive Bayes / 304 16.7 Evaluation and Analysis / 304 16.8 Results and Discussion / 304 16.8.1 Analysis of the Prediction Properties / 304 16.8.2 Analysis of Structural DDIs / 307 16.9 Conclusion / 309 References / 310 V PATTERN RECOGNITION IN MICROARRAYS 315 17 CONTENT-BASED RETRIEVAL OF MICROARRAY EXPERIMENTS 317 Hasan O?gul 17.1 Introduction / 317 17.2 Information Retrieval: Terminology and Background / 318 17.3 Content-Based Retrieval / 320 17.4 Microarray Data and Databases / 322 17.5 Methods for Retrieving Microarray Experiments / 324 17.6 Similarity Metrics / 327 17.7 Evaluating Retrieval Performance / 329 17.8 Software Tools / 330 17.9 Conclusion and Future Directions / 331 Acknowledgment / 332 References / 332 18 EXTRACTION OF DIFFERENTIALLY EXPRESSED GENES IN MICROARRAY DATA 335 Tiratha Raj Singh, Brigitte Vannier, and Ahmed Moussa 18.1 Introduction / 335 18.2 From Microarray Image to Signal / 336 18.2.1 Signal from Oligo DNA Array Image / 336 18.2.2 Signal from Two-Color cDNA Array / 337 18.3 Microarray Signal Analysis / 337 18.3.1 Absolute Analysis and Replicates in Microarrays / 338 18.3.2 Microarray Normalization / 339 18.4 Algorithms for De Gene Selection / 339 18.4.1 Within–Between DE Gene (WB-DEG) Selection Algorithm / 340 18.4.2 Comparison of the WB-DEGs with Two Classical DE Gene Selection Methods on Latin Square Data / 341 18.5 Gene Ontology Enrichment and Gene Set Enrichment Analysis / 343 18.6 Conclusion / 345 References / 345 19 CLUSTERING AND CLASSIFICATION TECHNIQUES FOR GENE EXPRESSION PROFILE PATTERN ANALYSIS 347 Emanuel Weitschek, Giulia Fiscon, Valentina Fustaino, Giovanni Felici, and Paola Bertolazzi 19.1 Introduction / 347 19.2 Transcriptome Analysis / 348 19.3 Microarrays / 349 19.3.1 Applications / 349 19.3.2 Microarray Technology / 350 19.3.3 Microarray Workflow / 350 19.4 RNA-Seq / 351 19.5 Benefits and Drawbacks of RNA-Seq and Microarray Technologies / 353 19.6 Gene Expression Profile Analysis / 356 19.6.1 Data Definition / 356 19.6.2 Data Analysis / 357 19.6.3 Normalization and Background Correction / 357 19.6.4 Genes Clustering / 359 19.6.5 Experiment Classification / 361 19.6.6 Software Tools for Gene Expression Profile Analysis / 362 19.7 Real Case Studies / 364 19.8 Conclusions / 367 References / 368 20 MINING INFORMATIVE PATTERNS IN MICROARRAY DATA 371 Li Teng 20.1 Introduction / 371 20.2 Patterns with Similarity / 373 20.2.1 Similarity Measurement / 374 20.2.2 Clustering / 376 20.2.3 Biclustering / 379 20.2.4 Types of Biclusters / 380 20.2.5 Measurement of the Homogeneity / 383 20.2.6 Biclustering Algorithms with Different Searching Schemes / 387 20.3 Conclusion / 391 References / 391 21 ARROW PLOT AND CORRESPONDENCE ANALYSIS MAPS FOR VISUALIZING THE EFFECTS OF BACKGROUND CORRECTION AND NORMALIZATION METHODS ON MICROARRAY DATA 394 Carina Silva, Adelaide Freitas, Sara Roque, and Lisete Sousa 21.1 Overview / 394 21.1.1 Background Correction Methods / 395 21.1.2 Normalization Methods / 396 21.1.3 Literature Review / 397 21.2 Arrow Plot / 399 21.2.1 DE Genes Versus Special Genes / 399 21.2.2 Definition and Properties of the ROC Curve / 400 21.2.3 AUC and Degenerate ROC Curves / 401 21.2.4 Overlapping Coefficient / 402 21.2.5 Arrow Plot Construction / 403 21.3 Significance Analysis of Microarrays / 404 21.4 Correspondence Analysis / 405 21.4.1 Basic Principles / 405 21.4.2 Interpretation of CA Maps / 406 21.5 Impact of the Preprocessing Methods / 407 21.5.1 Class Prediction Context / 408 21.5.2 Class Comparison Context / 408 21.6 Conclusions / 412 Acknowledgments / 413 References / 413 VI PATTERN RECOGNITION IN PHYLOGENETIC TREES 417 22 PATTERN RECOGNITION IN PHYLOGENETICS: TREES AND NETWORKS 419 David A. Morrison 22.1 Introduction / 419 22.2 Networks and Trees / 420 22.3 Patterns and Their Processes / 424 22.4 The Types of Patterns / 427 22.5 Fingerprints / 431 22.6 Constructing Networks / 433 22.7 Multi-Labeled Trees / 435 22.8 Conclusion / 436 References / 437 23 DIVERSE CONSIDERATIONS FOR SUCCESSFUL PHYLOGENETIC TREE RECONSTRUCTION: IMPACTS FROM MODEL MISSPECIFICATION, RECOMBINATION, HOMOPLASY, AND PATTERN RECOGNITION 439 Diego Mallo, Agustín Sánchez-Cobos, and Miguel Arenas 23.1 Introduction / 440 23.2 Overview on Methods and Frameworks for Phylogenetic Tree Reconstruction / 440 23.2.1 Inferring Gene Trees / 441 23.2.2 Inferring Species Trees / 442 23.3 Influence of Substitution Model Misspecification on Phylogenetic Tree Reconstruction / 445 23.4 Influence of Recombination on Phylogenetic Tree Reconstruction / 446 23.5 Influence of Diverse Evolutionary Processes on Species Tree Reconstruction / 447 23.6 Influence of Homoplasy on Phylogenetic Tree Reconstruction: The Goals of Pattern Recognition / 449 23.7 Concluding Remarks / 449 Acknowledgments / 450 References / 450 24 AUTOMATED PLAUSIBILITY ANALYSIS OF LARGE PHYLOGENIES 457 David Dao, Tomáš Flouri, and Alexandros Stamatakis 24.1 Introduction / 457 24.2 Preliminaries / 459 24.3 A Naïve Approach / 462 24.4 Toward a Faster Method / 463 24.5 Improved Algorithm / 467 24.5.1 Preprocessing / 467 24.5.2 Computing Lowest Common Ancestors / 468 24.5.3 Constructing the Induced Tree / 468 24.5.4 Final Remarks / 471 24.6 Implementation / 473 24.6.1 Preprocessing / 473 24.6.2 Reconstruction / 473 24.6.3 Extracting Bipartitions / 474 24.7 Evaluation / 474 24.7.1 Test Data Sets / 474 24.7.2 Experimental Results / 475 24.8 Conclusion / 479 Acknowledgment / 481 References / 481 25 A NEW FAST METHOD FOR DETECTING AND VALIDATING HORIZONTAL GENE TRANSFER EVENTS USING PHYLOGENETIC TREES AND AGGREGATION FUNCTIONS 483 Dunarel Badescu, Nadia Tahiri, and Vladimir Makarenkov 25.1 Introduction / 483 25.2 Methods / 485 25.2.1 Clustering Using Variability Functions / 485 25.2.2 Other Variants of Clustering Functions Implemented in the Algorithm / 487 25.2.3 Description of the New Algorithm / 488 25.2.4 Time Complexity / 491 25.3 Experimental Study / 491 25.3.1 Implementation / 491 25.3.2 Synthetic Data / 491 25.3.3 Real Prokaryotic (Genomic) Data / 495 25.4 Results and Discussion / 501 25.4.1 Analysis of Synthetic Data / 501 25.4.2 Analysis of Prokaryotic Data / 502 25.5 Conclusion / 502 References / 503 VII PATTERN RECOGNITION IN BIOLOGICAL NETWORKS 505 26 COMPUTATIONAL METHODS FOR MODELING BIOLOGICAL INTERACTION NETWORKS 507 Christos Makris and Evangelos Theodoridis 26.1 Introduction / 507 26.2 Measures/Metrics / 508 26.3 Models of Biological Networks / 511 26.4 Reconstructing and Partitioning Biological Networks / 511 26.5 PPI Networks / 513 26.6 Mining PPI Networks—Interaction Prediction / 517 26.7 Conclusions / 519 References / 519 27 BIOLOGICAL NETWORK INFERENCE AT MULTIPLE SCALES: FROM GENE REGULATION TO SPECIES INTERACTIONS 525 Andrej Aderhold, V Anne Smith, and Dirk Husmeier 27.1 Introduction / 525 27.2 Molecular Systems / 528 27.3 Ecological Systems / 528 27.4 Models and Evaluation / 529 27.4.1 Notations / 529 27.4.2 Sparse Regression and the LASSO / 530 27.4.3 Bayesian Regression / 530 27.4.4 Evaluation Metric / 531 27.5 Learning Gene Regulation Networks / 532 27.5.1 Nonhomogeneous Bayesian Regression / 533 27.5.2 Gradient Estimation / 534 27.5.3 Simulated Bio-PEPA Data / 534 27.5.4 Real mRNA Expression Profile Data / 535 27.5.5 Method Evaluation and Learned Networks / 536 27.6 Learning Species Interaction Networks / 540 27.6.1 Regression Model of Species interactions / 540 27.6.2 Multiple Global Change-Points / 541 27.6.3 Mondrian Process Change-Points / 542 27.6.4 Synthetic Data / 544 27.6.5 Simulated Population Dynamics / 544 27.6.6 Real World Plant Data / 546 27.6.7 Method Evaluation and Learned Networks / 546 27.7 Conclusion / 550 References / 550 28 DISCOVERING CAUSAL PATTERNS WITH STRUCTURAL EQUATION MODELING: APPLICATION TO TOLL-LIKE RECEPTOR SIGNALING PATHWAY IN CHRONIC LYMPHOCYTIC LEUKEMIA 555 Athina Tsanousa, Stavroula Ntoufa, Nikos Papakonstantinou, Kostas Stamatopoulos, and Lefteris Angelis 28.1 Introduction / 555 28.2 Toll-Like Receptors / 557 28.2.1 Basics / 557 28.2.2 Structure and Signaling of TLRs / 558 28.2.3 TLR Signaling in Chronic Lymphocytic Leukemia / 559 28.3 Structural Equation Modeling / 560 28.3.1 Methodology of SEM Modeling / 560 28.3.2 Assumptions / 561 28.3.3 Estimation Methods / 562 28.3.4 Missing Data / 562 28.3.5 Goodness-of-Fit Indices / 563 28.3.6 Other Indications of a Misspecified Model / 565 28.4 Application / 566 28.5 Conclusion / 580 References / 581 29 ANNOTATING PROTEINS WITH INCOMPLETE LABEL INFORMATION 585 Guoxian Yu, Huzefa Rangwala, and Carlotta Domeniconi 29.1 Introduction / 585 29.2 Related Work / 587 29.3 Problem Formulation / 589 29.3.1 The Algorithm / 591 29.4 Experimental Setup / 592 29.4.1 Data sets / 592 29.4.2 Comparative Methods / 593 29.4.3 Experimental Protocol / 594 29.4.4 Evaluation Criteria / 594 29.5 Experimental Analysis / 596 29.5.1 Replenishing Missing Functions / 596 29.5.2 Predicting Unlabeled Proteins / 600 29.5.3 Component Analysis / 604 29.5.4 Run Time Analysis / 604 29.6 Conclusions / 605 Acknowledgments / 606 References / 606 INDEX 609

Autorenportrait

Mourad Elloumi, PhD, is Professor in Computer Science at the University of Tunis-El Manar, Tunisia. Dr. Elloumi is the author/co-author of more than 50 publications in international journals and conference proceedings related to Algorithmics, Computational Molecular Biology, and Knowledge Discovery and Data Mining. Costas S. Iliopoulos, PhD, is Professor of AlgorithmDesign at King's College London, UK. Dr. Iliopoulos co-authored over 300 peer-reviewed articles in pattern matching and combinatorics of strings. He serves on the editorial board of the Journal of Discrete Algorithms, Computer Mathematics & Combinatorial Computing, and System Biology & Biomedical Technologies. Jason T. L. Wang, PhD, is Professor of Computer Science at the New Jersey Institute of Technology, USA. Dr. Wang has published extensively on Data Mining and Computational Molecular Biology, and has been a member of program committees for over 200 conferences and workshops in these and related areas. Albert Y. Zomaya, PhD, is the Chair Professor of High Performance Computing & Networking in the School of Information Technologies, University of Sydney, Australia. Dr. Zomaya published more than 500 scientific papers and articles and is author, co-author or editor of more than 20 books. Dr. Zomaya is Fellow of AAAS, IEEE, and IET.

Back cover copy

A comprehensive overview of high-performance pattern recognition techniques and approaches to Computational Molecular Biology This book surveys the developments of techniques and approaches on pattern recognition related to Computational Molecular Biology. Providing a broad coverage of the field, the authors cover fundamental and technical information on these techniques and approaches, as well as discussing their related problems. The text consists of twenty nine chapters, organized into seven parts: Pattern Recognition in Sequences, Pattern Recognition in Secondary Structures, Pattern Recognition in Tertiary Structures, Pattern Recognition in Quaternary Structures, Pattern Recognition in Microarrays, Pattern Recognition in Phylogenetic Trees, and Pattern Recognition in Biological Networks. <ul> <li>Surveys the development of techniques and approaches on pattern recognition in biomolecular data</li> <li>Discusses pattern recognition in primary, secondary, tertiary and quaternary structures, as well as microarrays, phylogenetic trees and biological networks</li> <li>Includes case studies and examples to further illustrate the concepts discussed in the book</li> </ul> Pattern Recognition in Computational Molecular Biology: Techniques and Approaches is a reference for practitioners and professional researches in Computer Science, Life Science, and Mathematics. This book also serves as a supplementary reading for graduate students and young researches interested in Computational Molecular Biology.