Table of Contents
Title Page
Copyright
Dedication
Foreword
List of Contributors
Chapter 1: Introduction
1.1 The Rationale for the Books
1.2 Development of the Field
1.3 The Basis of Chemoinformatics and the Diversity of Applications
Reference
Chapter 2: QSAR/QSPR
2.1 Introduction
2.2 Data Handling and Curation
2.3 Molecular Descriptors
2.4 Methods for Data Analysis
2.5 Classification Methods
2.6 Methods for Data Modeling
2.7 Summary on Data Analysis Methods
2.8 Model Validation
2.9 Regulatory Use of QSARs
Selected Reading
Reference
Chapter 3: Prediction of Physicochemical Properties of Compounds
3.1 Introduction
3.2 Overview of Modeling Approaches to Predict Physicochemical Properties
3.3 Methods for the Prediction of Individual Properties
3.4 Limitations of Statistical Methods
3.5 Outlook and Perspectives
Selected Reading
References
Chapter 4: Chemical Reactions
Chapter 4.1: Chemical Reactions – An Introduction
References
Chapter 4.2: Reaction Prediction and Synthesis Design
4.2.1 Introduction
4.2.2 Reaction Prediction
4.2.3 Synthesis Design
4.2.4 Conclusion
References
Chapter 4.3: Explorations into Biochemical Pathways
4.3.1 Introduction
4.3.2 The BioPath.Database
4.3.3 BioPath.Explore
4.3.4 Search Results
4.3.5 Exploitation of the Information in BioPath.Database
4.3.6 Summary
Selected Reading
References
Chapter 5: Structure–Spectrum Correlations and Computer-Assisted Structure Elucidation
5.1 Introduction
5.2 Molecular Descriptors
5.3 Infrared Spectra
5.4 NMR Spectra
5.5 Mass Spectra
5.6 Computer-Aided Structure Elucidation (CASE)
Selected Reading
Acknowledgement
References
Chapter 6.1: Drug Discovery: An Overview
6.1.1 Introduction
6.1.2 Definitions of Some Terms Used in Drug Design
6.1.3 The Drug Discovery Process
6.1.4 Bio- and Chemoinformatics Tools for Drug Design
6.1.5 Structure-based and Ligand-Based Drug Design
6.1.6 Target Identification and Validation
6.1.7 Lead Finding
6.1.8 Lead Optimization
6.1.9 Preclinical and Clinical Trials
6.1.10 Outlook: Future Perspectives
Selected Reading
References
Chapter 6.2: Bridging Information on Drugs, Targets, and Diseases
6.2.1 Introduction
6.2.2 Existing Data Sources
6.2.3 Drug Discovery Use Cases in Computational Life Sciences
6.2.4 Discussion and Outlook
Selected Reading
References
Chapter 6.3: Chemoinformatics in Natural Product Research
6.3.1 Introduction
6.3.2 Potential and Challenges
6.3.3 Access to Software and Data
6.3.4
In Silico
Driven Pharmacognosy-Hyphenated Strategies
6.3.5 Opportunities
6.3.6 Miscellaneous Applications
6.3.7 Limits
6.3.8 Conclusion and Outlook
Selected Reading
References
Chapter 6.4: Chemoinformatics of Chinese Herbal Medicines
6.4.1 Introduction
6.4.2 Type 2 Diabetes: The Western Approach
6.4.3 Type 2 Diabetes: The Chinese Herbal Medicines Approach
6.4.4 Building a Bridge
6.4.5 Screening Approach
Selected Reading
References
Chapter 6.5: PubChem
6.5.1 Introduction
6.5.2 Objectives
6.5.3 Architecture
6.5.4 Data Sources
6.5.5 Submission Processing and Structure Representation
6.5.6 Data Augmentation
6.5.7 Preparation for Database Storage
6.5.8 Query Data Preparation and Structure Searching
6.5.9 Structure Query Input
6.5.10 Query Processing
6.5.11 Getting Started with PubChem
6.5.12 Web Services
6.5.13 Conclusion
References
Chapter 6.6: Pharmacophore Perception and Applications
6.6.1 Introduction
6.6.2 Historical Development of the Modern Pharmacophore Concept
6.6.3 Representation of Pharmacophores
6.6.4 Pharmacophore Modeling
6.6.5 Application of Pharmacophores in Drug Design
6.6.6 Software for Computer-Aided Pharmacophore Modeling and Screening
6.6.7 Summary
Selected Reading
References
Chapter 6.7: Prediction, Analysis, and Comparison of Active Sites
6.7.1 Introduction
6.7.2 Active Site Prediction Algorithms
6.7.3 Target Prioritization: Druggability Prediction
6.7.4 Search for Sequentially Homologous Pockets
6.7.5 Target Comparison: Virtual Active Site Screening
6.7.6 Summary and Outlook
Selected Reading
References
Chapter 6.8: Structure-Based Virtual Screening
6.8.1 Introduction
6.8.2 Docking Algorithms
6.8.3 Scoring
6.8.4 Structure-Based Virtual Screening Workflow
6.8.5 Protein-Based Pharmacophoric Filters
6.8.6 Validation
6.8.7 Summary and Outlook
Selected Reading
References
Chapter 6.9: Prediction of ADME Properties
6.9.1 Introduction
6.9.2 General Consideration on SPR/QSPR Models
6.9.3 Estimation of Aqueous Solubility (log
S
)
6.9.4 Estimation of Blood–Brain Barrier Permeability (log
BB
)
6.9.5 Estimation of Human Intestinal Absorption (HIA)
6.9.6 Other ADME Properties
6.9.7 Summary
Selected Reading
References
Chapter 6.10: Prediction of Xenobiotic Metabolism
6.10.1 Introduction: The Importance of Xenobiotic Biotransformation in the Life Sciences
6.10.2 Biotransformation Types
6.10.3 Brief Review of Methods
6.10.4 User Needs: Scientists Use Metabolism Information in Different Ways
6.10.5 Case Studies
Selected Reading
References
Chapter 6.11: Chemoinformatics at the CADD Group of the National Cancer Institute
6.11.1 Introduction and History
6.11.2 Chemical Information Services
6.11.3 Tools and Software
6.11.4 Synthesis and Activity Predictions
6.11.5 Downloadable Datasets
References
Chapter 6.12: Uncommon Data Sources for QSAR Modeling
6.12.1 Introduction
6.12.2 Observational Metadata and QSAR Modeling
6.12.3 Pharmacovigilance and QSAR
6.12.4 Conclusions
Selected Reading
References
Chapter 6.13: Future Perspectives of Computational Drug Design
6.13.1 Where Do the Medicines of the Future Come from?
6.13.2 Integrating Design, Synthesis, and Testing
6.13.3 Toward Precision Medicine
6.13.4 Learning from Nature: From Complex Templates to Simple Designs
6.13.5 Conclusions
Selected Reading
References
Chapter 7: Computational Approaches in Agricultural Research
7.1 Introduction
7.2 Research Strategies
7.3 Estimation of Adverse Effects
7.4 Conclusion
Selected Reading
References
Chapter 8: Chemoinformatics in Modern Regulatory Science
8.1 Introduction
8.2 Data Gap Filling Methods in Risk Assessment
8.4 New Approach Descriptors
8.5 Chemical Space Analysis
8.6 Summary
Selected Reading
References
Chapter 9: Chemometrics in Analytical Chemistry
9.1 Introduction
9.2 Sources of Data: Data Preprocessing
9.3 Data Analysis Methods
9.4 Validation
9.5 Applications
9.6 Outlook and Prospects
Selected Reading
References
Chapter 10: Chemoinformatics in Food Science
10.1 Introduction
10.2 Scope of Chemoinformatics in Food Chemistry
10.3 Molecular Databases of Food Chemicals
10.4 Chemical Space of Food Chemicals
10.5 Structure–Property Relationships
10.6 Computational Screening and Data Mining of Food Chemicals Libraries
10.7 Conclusion
Selected Reading
References
Chapter 11: Computational Approaches to Cosmetics Products Discovery
11.1 Introduction: Cosmetics Demands on Computational Approaches
11.2 Case I: The Multifunctional Role of Ectoine as a Natural Cell Protectant (Product: Ectoine, “Cell Protection Factor”, and Moisturizer)
11.3 Case II: A Smart Cyclopeptide Mimics the RGD Containing Cell Adhesion Proteins at the Right Site (Product: Cyclopeptide-5: Antiaging)
11.4 Conclusions: Cases I and II
References
Chapter 12: Applications in Materials Science
12.1 Introduction
12.2 Why Materials Are Harder to Model than Molecules
12.3 Why Are Chemoinformatics Methods Important Now?
12.4 How Do You Describe Materials Mathematically?
12.5 How Well do Chemoinformatics Methods Work on Materials?
12.6 What Are the Pitfalls when Modeling Materials?
12.7 How Do You Make Good Models and Avoid the Pitfalls?
12.8 Materials Examples
12.9 Biomaterials Examples
12.10 Perspectives
Selected Reading
References
Chapter 13: Process Control and Soft Sensors
13.1 Introduction
13.2 Roles of Soft Sensors
13.3 Problems with Soft Sensors
13.4 Adaptive Soft Sensors
13.5 Database Monitoring for Soft Sensors
13.6 Efficient Process Control Using Soft Sensors
13.7 Conclusions
Selected Readings
References
Chapter 14: Future Directions
14.1 Well-Established Fields of Application
14.2 Emerging Fields of Application
14.3 Renaissance of Some Fields
14.4 Combined Use of Chemoinformatics Methods
14.5 Impact on Chemical Research
Index
End User License Agreement
Pages
C1
iii
iv
v
vi
xvii
xviii
xix
xx
xxi
xxii
xxiii
xxiv
xxv
xxvi
1
2
3
4
5
6
7
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
395
396
397
398
399
400
401
402
403
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
Guide
Cover
Table of Contents
Foreword
List of Illustrations
Chapter 1: Introduction
Figure 1.1 Fundamental questions of a chemist and the chemoinformatics methods that can be used in providing support for solving these tasks.
Chapter 2: QSAR/QSPR
Figure 2.1 The general QSAR/QSPR procedure.
Figure 2.2 Flow chart showing the general steps for generating QSAR models.
Figure 2.3 Example of a simple decision tree.
Figure 2.4 ANN, example of a two-layered ANN (there are only two layers of weights!).
Figure 2.5 General modeling/validation workflow.
Figure 2.6 PCA score projection of Iris dataset objects on the plane defined by principal components PC1 and PC2; (○)
Iris setosa
, (□)
Iris versicolor
, (◊)
Iris virginica
.
Figure 2.7 Example of the distribution of objects selected by the CADEX method. A PCA score projection of the Iris dataset objects on the plane defined by PC1 and PC2; open and filled symbols represent training and test sets, respectively.
Figure 2.8 Schematic representation of the chemical space for a set of compounds described by the first three principal components (PC1-3). The test set molecules (grey balls) are located within the applicability domain covered by the molecules of the training set (black balls).
Figure 2.9 Modeling/validation workflow. The scope of the external validation is shown in gray color; the scope of the inner validation over the validation set is shown in cross-hatching. The internal validation methods are shown with a bold frame.
Figure 2.10 Two possible ways to perform cross-validation.
Figure 2.11 Example ROC curve (bold line) for a binary classification problem. The straight line is obtained by guessing the class membership.
Chapter 3: Prediction of Physicochemical Properties of Compounds
Figure 3.1 Functional groups for alkanes according to Benson's [7] notation.
Figure 3.2 RMSE of different models as a function of the number of non-hydrogen atoms (NHA) in molecules.
Figure 3.3 The model prediction errors for five sets of compounds are shown as a function of temperature. The model errors for the PATENTS sets match the experimental accuracy of the data in this set.
Figure 3.4 Microspecies and constants using the example of cetirizine. The microspecies are represented as triplets, where the first position refers to the hydroxyl group of the carboxylic acid group, the second one refers to the middle nitrogen atom, and the third position refers to the nitrogen atom farthest away from the carboxylic group; for example, •○• represents the zwitterionic form with one proton bound to the middle nitrogen, the dominant neutral form of cetirizine. (a) Cetirizine. Protonation sites (OH, N, N) in bold face. (b) Protonation scheme. Cetirizine has
n
= 3 protonation sizes, and thus 2
3
= 8 microstates and 3 × 2
2
= 12 microequilibria. The 3 + 1 = 4 macrostates are shown below, with
h
= number of bound hydrogens. (c) Distribution of microspecies as a function of pH. The microspecies •○○ and ○○• are very close to the baseline. (d) Microconstants. All values are experimentally determined.
Figure 3.5 Reaction equation for the ionization of aliphatic carboxylic acids with the physicochemical effects is indicated:
α
O
is effective polarizability,
Q
σ
is inductive effect on the ionizable atom,
A
2D
is steric hindrance at the ionization site, and
χ
π
is electronegativity at the
π
-carbon atom.
Chapter 4: Chemical Reactions
Figure 4.1 Different types of problems encountered in dealing with chemical reactions.
Figure 4.2 Intermediate in the synthesis of maitotoxin, one of the most complex targets for total synthesis. The light grey parts of the molecule show the regions that have been synthesized [6]. Much of the hard work has been completed. However, joining these fragments together will require substantial work, as will completing the remaining fragments of the molecule.
Figure 4.3 Eribulin/Halaven. The most complex molecule synthesized and sold.
Figure 4.4 Acid-catalyzed rearrangement to dolabriferol.
Figure 4.5 An overview of the methods in WODCA.
Figure 4.6 Biochemical Pathways wall chart (https://www.roche.com/pathways [3] – accessed January 2018).
Figure 4.7 Details on a reaction on the Biochemical Pathways wall chart.
Figure 4.8 Positions indicating the occurrence of l-glutamate on the wall chart.
Figure 4.9 Different views on a biochemical reaction.
Figure 4.10 Details on the query molecule chorismate.
Figure 4.11 Results of a search for monooxygenases in the “Reaction” field.
Figure 4.12 Results for searching for enzyme EC “1.13.12.4.”
Figure 4.13 Shortest pathway from farnesyl-diphosphate to artemisinin.
Figure 4.14 Section of the full SOM of the 135 reactions that shows the distribution of reactions of the subsubclasses of EC 3.1.c.d.
Figure 4.15 SOM of reactions from all classes of enzymes from EC 1.b.c.d. to EC 6.b.c.d.
Figure 4.16 Effect of an enzyme on the energies of the substrate, the transition state, the intermediate, and the product of a reaction. The substantial lowering of the energies of the transition states and the intermediate is clearly distinguished.
Figure 4.17 The intermediate of the conversion of AMP to IMP obtained by addition of water to AMP. The structure of the inhibitor of AMP deaminase, carbocyclic coformycin.
Figure 4.18 Superimpositions of the 3D structure of the inhibitor carbocyclic coformycin onto the substrate AMP, the intermediate, and the product IMP of the deamination of AMP.
Figure 4.19 Outline of the method for uncovering metabolic pathways relevant to phenotypic traits of microbial genomes.
Figure 4.20 Outline of the chemoinformatics analysis of the reverse pathway engineering approach.
Figure 4.21 Three pathways to 3-methylbutanoic acid, generated by the RPE approach. The sequences (a) and (b) are known; sequence (c) is novel.
Figure 4.22 Suggested reaction from the sequence in Figure 4.21c and the reference reaction from BioPath.Database.
Figure 4.23 Tree of the l-lactate oxidase (LOX) homologs from lactic acid bacteria (LAB).
Chapter 5: Structure–Spectrum Correlations and Computer-Assisted Structure Elucidation
Figure 5.1 The HOSE code for a selected carbon atom describes its structural environment in hierarchical order by walking through the molecule in spheres. Only the non-hydrogen atoms are explicitly considered.
Figure 5.2 RDF code for a phosphonic ester using atomic numbers as atom property.
Figure 5.3 Training of a CPG NN to learn relationships between structures and IR spectra and example of a simulated spectrum.
Figure 5.4 Scheme of a counterpropagation network for the derivation of 3D structures.
Figure 5.5 Radial distribution function for proton H-6 using partial atomic charge as the atomic property and indications of the distances contributing to each peak.
Figure 5.6 Experimental
1
H NMR spectrum (below) of the structure in the upper right corner compared to the full spectrum predicted by SPINUS (above) for the same structure. (*) In the experimental spectrum, the signal at 7.26 ppm is from the solvent (CDCl
3
), the signal at 5.98 ppm is from the exchangeable NH proton, and the peaks at 2.85–2.95 ppm are from residues of DMF.
Figure 5.7 Screen capture of Mnova NMR commercial software (Mestrelab Research, S.L, www.mestrelab.com). In this example, the selected active page contains one experimental
1
H NMR spectrum, fully processed and analyzed, stacked together with its
1
H NMR predicted counterpart. Automatic assignment of the signals (possible with
1
H,
13
C, HSQC, and COSY experiments) is also depicted; atom labels are color-coded depending upon the quality of the assignment as derived from a fuzzy logic expert system.
Figure 5.8 Screenshot of Mass Frontier software (HighChem, Ltd., www.highchem.com) showing predicted fragmentation mechanisms for a user provided compound.
Figure 5.9 Screenshot of ACD/Structure Elucidator Suite, Version 2016.1, Advanced Chemistry Development, Inc., Toronto, ON, Canada, www.acdlabs.com.
Chapter 6.1: Drug Discovery: An Overview
Figure 6.1.1 The drug discovery process.
Figure 6.1.2 Distribution of drug targets.
Figure 6.1.3 Kohonen map (10x7) obtained from a dataset of 112 dopamine agonists and 60 benzodiazepine agonists.
Figure 6.1.4 Kohonen map (40x30) of a dataset consisting of the dopamine and benzodiazepine agonists of Figure 6.1.3 and 8,223 compounds of a chemical supplier catalog.
Figure 6.1.5 Structures that were mapped into the neuron at position 5,8 of the Kohonen map of Figure 6.1.4.
Figure 6.1.6 Synthesis and high-throughput screening results of a library of hydantoins.
Figure 6.1.7 SOMs of a library of 5,513 hydantoins obtained through six different structure representations. ESP, electrostatic potential; HBP, hydrogen-bonding potential; HYP, hydrophobicity potential. Neurons that obtained a hit in dark gray; neurons with only non-hits in light gray.
Figure 6.1.8 Development of a filter for hits in the hydantoin library. (a) SOM of the training set, (b) classification map obtained from the neurons with hits and their first-sphere neighbor neurons, and (c) SOM of the test set.
Figure 6.1.9 Flexible superimposition of the 3D structure of three muscle relaxants: chlorpromazine, tolperisone, and tizanidine.
Figure 6.1.10 Different strategies to design a ligand in target-based drug discovery: docking (left), building (center), and linking (right). D = H-bond donor, A = H-bond acceptor, H1, H2 = hydrophobic regions of the protein.
Figure 6.1.11 Thermodynamics of ligand binding.
Figure 6.1.12 Factors affecting lead identification and optimization.
Figure 6.1.13 Compounds showing baseline toxicity (narcosis).
Figure 6.1.14 Compounds having a variety of toxic modes of action (MOA).
Figure 6.1.15 Architecture of a counterpropagation neural network for classifying phenols into four different MOAs.
Figure 6.1.16 Distribution of phenols in the four output layers of the counterpropagation network.
Figure 6.1.17 Screen of ChemoTyper: on the left are three chemotypes of the thalidomide skeleton with chemotypes differentiated by sigma charge (light gray) and total charge (dark gray); on the right-hand side is part of a dataset that indicates hits for the two different chemotypes.
Chapter 6.3: Chemoinformatics in Natural Product Research
Figure 6.3.1 Selection of appropriate modeling tools depending on the aim of the study.
Figure 6.3.2 Depending on the selected methods, theoretical validation experiments are necessary to select the best performing models for making predictions.
Figure 6.3.3 Virtual screening workflow including additional filtering and selection criteria.
Figure 6.3.4 Implementation of chemoinformatics in natural product research.
Figure 6.3.5 Virtual screening approach for the identification of novel active constituents for the target of interest.
Figure 6.3.6 Perlatolic acid fitted into a pharmacophore model for mPGES-1 inhibitors. Chemical features are color-coded: hydrophobic, gray; negatively ionizable group, dark blue; aromatic ring (brown and blue plain). A steric restriction (cyan shape) is depicted as light gray cloud.
Figure 6.3.7 Target fishing approach for the identification of macromolecular targets for a specific compound.
Figure 6.3.8 Computational methods as tools to provide insight into the molecular ligand binding interaction.
Figure 6.3.9 Computational methods for the selection of plant material as promising starting point for experimental investigations.
Chapter 6.4:
Figure 6.4.1 Creating a herbal prescription database, which contains data regarding herbs and their chemical structures of active constituents against “Xiaoke.”
Figure 6.4.2 Creating anti-T2D compound database (ADB) from scientific literatures, which contains the chemical structures of anti-T2D agents, the mechanisms, and targets.
Figure 6.4.3 HCMN. The network demonstrates the relations among herbs, chemotypes, and mechanisms of actions.
Figure 6.4.4
Radix Rehmanniae
(H02) is one of the top herbs for treating T2D; its main active constituents have cinnamyl-acid-like common fragments/chemotypes, which are associated with mechanism group M06, particularly for ALR2 target. Library for virtual screening. The compounds selected for the virtual library are based on cinnamyl-acid-like scaffolds/mimics.
Figure 6.4.5 Discovering new anti-T2D agents. The virtual hits are confirmed by chemistry and bioassays.
Figure 6.4.6 Cinnamyl-acid-like compounds as anti-T2D agents (ALR2 inhibitors).
Figure 6.4.7 Epalrestat.
Figure 6.4.8 Anti-T2D agents derived from Dihuang and Huangqi. Cinnamyl-acid-like chemotypes are in bold. Glycosides are colored in light grey.
Chapter 6.6:
Figure 6.6.1
p
-Aminobenzoic acid (PABA) and
p
-aminobenzenesulfonamide are isosteres and show similarities regarding interatomic distances that are critical for binding to the dihydrofolate reductase enzyme surface [2]. Binding of the sulfonamide instead of PABA thus inhibits the biosynthesis of tetrahydrofolic acid.
Figure 6.6.2 Analogy between estradiol and
trans
-diethylstilbestrol [2].
Figure 6.6.3 Interaction capabilities of natural (
R
)-(−)-adrenaline (a) and its stereoisomer (
S
)-(+)-adrenaline (b) [2].
Figure 6.6.4 Hydrogen bonding geometry: the involved N, H, and O atoms are nearly linearly aligned. The N–O distance is typically between 2.8 and 3.2 Å. The N–H–O angle is >150° and the C
O–H angle between 100° and 180°.
Figure 6.6.5 On the formation of lipophilic contacts (hydrophobic interactions), water molecules covering lipophilic areas of the binding pocket are forced to move to the outside of the ligand–receptor complex. This increases the entropy of the system due to a gain in mobility of the water molecules. The resulting contribution to the binding affinity is typically between −100 and −200 J/mol per Å
2
of the lipophilic contact surface.
Figure 6.6.6 Steric configurations of π–π and cation–π interactions [25].
Figure 6.6.7 Thermolysin in complex with the hydroxamic acid inhibitor
N
-[(2
S
)-2-benzyl-3-(hydroxyamino)-3-oxopropanoyl]-l-alanyl-
N
-(4-nitrophenyl)glycinamide (BAN, PDB-code: 5TLN). The Zn
2+
ion is penta-coordinated with the characteristic amino acids Glu166, Hist142, and His146 of thermolysin, and the hydroxyl- and carbonyl-oxygen of the hydroxamic acid moiety [27].
Figure 6.6.8 Receptor-based pharmacophore generated by
LigandScout
for the CDK2/inhibitor complex 1KE9. Gray spheres represent exclusion volumes that model the shape of the receptor surface. Yellow spheres represent hydrophobic, green arrows hydrogen bond donor, and red arrows hydrogen bond acceptor features. The blue spherical star represents a positive ionizable group in an ionic interaction.
Figure 6.6.9 Ligand-based pharmacophore modeling workflow starting from a set of known actives.
Figure 6.6.10 (a) Selection of
n
molecules from a database with
N
entries. (b) ROC curves for an ideal, an overlapping and a random distribution of actives and decoys.
Figure 6.6.11 (a) 3D pharmacophore derived from the TLR2 binding site. (b) Binding mode of the TLR2 antagonist discovered by pharmacophore-based virtual screening.
Figure 6.6.12 3D pharmacophore and dynophore of kaempferol bound to SULT1E1. (a) Static view of kaempferol bound to SULT1E1 with depicted 3D pharmacophore. (b) Kaempferol is represented with the resulting dynophore showed as spatial point clouds.
Chapter 6.7:
Figure 6.7.1 (a) Protein structure (gray) with unbound potential ligand (blue) in surface representation. (b) Prediction of potential binding sites (yellow, red, blue). (c) Protein–ligand complex structure, where ligand binds to the yellow binding site.
Figure 6.7.2 Exemplary illustration of pocket detection methods vertically grouped into grid-based and grid-free approaches and horizontally separated into geometry- and energy-based methods. Republished with permission of Future Science Group, from Future Med. Chem. (2014) 6(3), 319–31; permission conveyed through Copyright Clearance Center, Inc.
Figure 6.7.3 DoGSiteScorer-based model building and druggability prediction: First, pockets were predicted for all structures of the DD dataset. Descriptors were calculated and discriminative features were selected, based on which a SVM model was trained. Finally, this model can be used for druggability predictions of novel target structures.
Figure 6.7.4 (a) Binding site of monomeric cyclin-dependent kinase 2 (PDB code 1AQ1) [74]. (b) Four identical subunits of HIV-1 protease forming two symmetrical binding sites (PDB code 5KR2) [75].
Figure 6.7.5 Depiction of the structural triangle descriptor used in TrixP. (A) Example of the TrixP descriptor with two hydrogen-bond donors (blue) as well as an apolar point (yellow) as triangle corners. (B) Schematic superposition of two different binding sites based on a matching descriptor with a hydrogen-bond donor (blue), a hydrogen-bond acceptor (red), and an apolar point (yellow) as triangle corners. (a,b) show identical descriptors in two different binding sites and (c) shows the respective superposition of the binding sites based on the matching descriptors.
Chapter 6.8: Structure-Based Virtual Screening
Figure 6.8.1 Interactions in a protein–ligand complex (PDB code: 1SQN). The major energetic contributions result from hydrogen bonds and hydrophobic effect. The protein surface is colored according to hydrophobicity (dark gray, hydrophilic atoms; white, hydrophobic atoms). The dominating interaction in the complex of norethindrone with the progesterone receptor is the hydrophobic effect, which is caused by the burial of the four aliphatic rings of the ligand in a deep hydrophobic pocket of the protein. The two hydrogen bonds (left side) contribute less to the overall binding affinity; they rather assist in orientating the ligand in the active site.
Figure 6.8.2 Best practice SBVS workflow.
Figure 6.8.3 Score histograms (a) allow an intuitive assessment of a docking program's sensitivity and specificity for a defined cutoff. Enrichment plots (b) and ROC curves (c) are used to assess a docking program's quality. The example shows two hypothetical docking runs for a library of 10,000 compounds containing 100 active ligands.
Chapter 6.9: Prediction of ADME Properties
Figure 6.9.1 Performance of classification models on blood–brain barrier permeability
Chapter 6.10: Prediction of Xenobiotic Metabolism
Figure 6.10.1 Generalized scheme of the consequences of metabolic biotransformation.
Figure 6.10.2 (a) Simplified catalytic cycle for monooxygenation effecting the overall conversion:
. (b) The CYP iron-heme prosthetic group – the enzyme's catalytic center.
Figure 6.10.3 Indomethacin and a number of its amide derivatives (from Ref. [13]) with half-life values in minutes (
t
1/2
) indicated for rat and human liver microsomes. Preferred sites of metabolism for each analog as estimated by the MetaSite program are indicated by a grey circle. (Adapted from Marchant
et al
. 2016 [13].)
Figure 6.10.4 The structure of quetiapine (
6
). Areas of metabolic liability are indicted by grey colored spheres:
O
-dealkylation,
N
-dealkylation, sulfur oxidation, and carboaromatic hydroxylation.
Figure 6.10.5
In vivo
(pig) and
in vitro
(human and porcine liver microsomes) metabolism of 25B-NBOMe.
Figure 6.10.6 Top five observed sites of metabolism as predicted by Meteor Nexus. The annotated sites of metabolism (SoMs) are indicated in Table 6.10.4.
Figure 6.10.7 Putative pathways of toxification suggested by a Meteor Nexus analysis of 25B-NBOMe. Potentially adduct-forming intermediates are depicted in light grey (18, 20, 21, 23).
Chapter 6.11: Chemoinformatics at the CADD Group of the National Cancer Institute
Figure 6.11.1 Web form of the Enhanced NCI Database Browser.
Figure 6.11.2 Possible workflows of CIR queries.
Figure 6.11.3 CSLS results page (excerpt) for search with query string “740.”
Figure 6.11.4 Web form of the GIF/PNG Creator web service.
Chapter 6.12: Uncommon Data Sources for QSAR Modeling
Figure 6.12.1 The growth of publications on QSAR modeling correlates with the accumulation of experimental data. The chart is generated by Google Ngram Viewer (http://books.google.com/ngrams);
Y
-axis – percentage among all books in the Google Ngram.
Figure 6.12.2 Schematic workflow showing the use of multiple data sources for developing, interpreting, and validating QSAR models that classify drugs as SJS-active or inactive. VigiBase provided 364 drugs whose chemical structures were used as variables for QSAR modeling. QSAR models provided structural alerts for interpretation and predicted potential SJS actives and inactives in DrugBank. Finally, the predicted actives and inactives were evaluated for evidence of SJS activity or lack thereof in VigiBase, ChemoText, and Micromedex (see text for additional discussion).
Chapter 6.13: Future Perspectives of Computational Drug Design
Figure 6.13.1 Schematic of an integrated design, synthesis, and screening platform illustrating the fully automated process with a feedback loop for adaptive compound optimization. An adaptive quantitative structure–activity relationship model guides the compound design and selection process. The diagram on the right illustrates an on-chip microreactor platform as a prototype of fully integrated future design–synthesize–test instruments for drug discovery. The example depicts a module for reductive amination.
Figure 6.13.2 Examples of computationally
de novo
designed and chemically synthesized bioactive compounds, taken from recent publications.
Figure 6.13.3 Ligand design in computed fitness landscapes. The process starts with virtual compound enumeration (upper left) and visualization of the populated chemical space (upper right). Then, computed target activities are highlighted and suitable compounds identified, shown here for ligand selectivity for human sigma-1 and D4 receptors (lower right). In the landscapes, the coloring indicates regions of chemical space with a high (dark gray) and low (light gray) probability of finding ligands of the respective receptor.
Figure 6.13.4 From a known drug (Fasudil
4
) to a
de novo
generated mimetic agent by computational fragment assembly. The cartoon illustrates the complex between the computer-generated ligand
5
and its macromolecular target, death-associated protein kinase 3 (PDB-ID: 5a6n). Essential hydrogen bridges are shown as dashed lines.
Figure 6.13.5 From a complex natural product template to a synthetically easily accessible mimetic by computer-assisted
de novo
design. Morphing of the anticancer natural product (−)-englerin A into an isofunctional compound with a different scaffold enabled the discovery of a novel class of potent and selective inhibitors of transient receptor potential (TRP) M8 calcium channels.
Chapter 7: Computational Approaches in Agricultural Research
Figure 7.1 Chemical structures and superimposed X-ray coordinates of 1,2-diphenylethane (dark, CSD-code DIBENZ04) and benzyloxybenzene (bright, CSD-code MUYDOZ) indicating the different orientations of one phenyl ring induced by substitution of methylene with an ether function.
Figure 7.2 Superposition of a Protox inhibitor from pyridinedione type on a calculated protoporphyrinogen-like template (cyan). For reasons of clarity corresponding ring systems are indicated and hydrogen atoms are omitted. Atoms are color coded as follows: carbon gray, nitrogen blue, oxygen red, sulfur yellow, and chlorine green.
Figure 7.3 Common interaction pattern of potent Protox inhibitors from uracil (left) and pyridine type. Each molecule comprises two ring systems and electron-rich functions on both sides of the linked rings (blue and red colored).
Figure 7.4 Pharmacophore model of 318 Protox inhibitors (color code as indicated at Figure 7.2).
Figure 7.5 Graph indicates the correlation of experimental and predicted IC
50
values yielded by a “leave-one-out” cross-validation (
q
2
= 0.95) for the pharmacophore model shown in Figure 7.4.
Figure 7.6 Contour map derived from a 3D-QSAR study. Clouds indicate favorable space to be occupied by potent Protox inhibitors. While the highly active imidazolinone derivative (a) fits almost perfectly, the ethylcarboxylate residue of the weaker ligand protrudes the preferred region (b).
Figure 7.7 Pseudoreceptor model for insecticidal ryanodine derivates constructed with the program PrGen [4]. The binding site model is composed of seven amino acid residues and contains the structure of ryanodine [5]. Hydrogen bond interactions are indicated with dashed lines.
Figure 7.8 Protocol of a classical docking and scoring procedure. The binding site cavity is characterized via, for example, hydrophobic (circles), hydrogen bond donor (lines), and hydrogen bond acceptor properties (circle segment). Each compound of a database (or real library) is flexibly docked into the binding site, and the free binding energy [kJ/mol] for each of the derived poses is estimated by a mathematical scoring function.
Figure 7.9 X-ray crystallographically determined binding site of Protox [15] including the co-crystallized inhibitor INH (structural formula see Figure 7.10) and a part of the cofactor FAD. Highlighted is Arg98 at the entrance of the binding site cavity interacting with INH and almost all solutions of the FlexX approach via electrostatic and hydrogen bond interactions. Two docking poses representing a cluster of yielded solutions are indicated: one at the outside and one inside the binding site cavity (orange-colored carbon atoms).
Figure 7.10 Structural formula of INH and comparison of the poses derived from FlexX docking (single colored) and crystallization experiment (thick). Indicated is the crucial Arg98 that stabilizes all poses with the exception of the blue-colored solution, which interacts with the acid group (red-colored oxygen atoms) to the opposite side (i.e., Asn67).
Figure 7.11 Comparison of two docking solutions for BASF's uracil derivative UBTZ with the bound INH. UBTZ interacts with Arg98 over the carbonyl oxygen of uracil and a fluorine of the benzothiazole ring (a) or the nitrogen atom of the benzothiazole ring (b).
Figure 7.12 Docking solution for protoporphyrin IX in the Protox binding site. One propionic acid is close to Arg98, but does not form an explicit hydrogen bond. Asterisks indicate the proposed reaction centers C20 of protoporphyrin IX and N5 of FAD (see text for details).
Chapter 8: Chemoinformatics in Modern Regulatory Science
Figure 8.1 Parallel advances in science and computational technology over time.
Figure 8.2 The decision framework for the Threshold of Toxicological Concern.
Figure 8.3 Read-across tree, hierarchy of logical relationships.
Figure 8.4 Chemistry-aware 3-tiered architecture. (a) Typical chemistry-aware RDBS, (b) new technology.
Figure 8.5 Top-level diagram of the data model for the chemistry-centered toxicity database.
Figure 8.6 Typical coverage plot of ToxPrints against datasets. The solid and open circles represent the PAFA and Tox21 datasets respectively.
Figure 8.7 Histogram of structural hits matching with the chemotypes.
Figure 8.8 Hammett constant and the substitution effect on p
K
a
values.
Figure 8.9 Effect of substituents on the charges at the ring carbon atoms (ipso position) and on the p
K
a
values of substituted benzoic acids.
Figure 8.10 Histogram of reactions rules for different chemical inventories.
Figure 8.11 Linear fragments using graph theory and a depth-first search algorithm.
Figure 8.12 Effect of path length and annotation scheme on the number of unique linear paths generated from a set of 4400 compounds from the PAFA database. Annotation options are atom identity (AI), number of heavy-atom connections (
n
C), number of connected hydrogen atoms (
n
H), and atom partial charge (PC).
Figure 8.13 Comparison of the number of unique linear paths generated from 4400 compounds from two different datasets: PAFA and Tox21. Annotation scheme used was (AI,
n
C,
n
H, PC).
Figure 8.14 Chemical space comparison of four inventories by principal component analysis using ToxPrint chemotypes (a) and physicochemical properties (b).
Chapter 9: Chemometrics in Analytical Chemistry
Figure 9.1 Comparison of raw data with scaled data (a) and the effect of scaling on the calibration line (b).
Figure 9.2 Combination of SRD with ANOVA decomposes the effects of factors in an easily perceivable way. Data preprocessing methods: scl, range scaling; nor, normalization to unit length; rnk, rank transformation; std, standardization (autoscaling); type of tissue: digestive gland, circles; gills, boxes; haemolymph, rhombuses; comet assay evaluation methods: tail intensity (a), tail length (b), olive tail moment (c).
Figure 9.3 Matrix decomposition in principal component analysis. The loading matrix
P
′ contains the coefficients of the original variables in the principal components, while the score matrix
T
contains the principal component scores (values) of the samples.
E
is an error matrix in cases of
a
<
m
. (If
a
=
m
the
E
matrix is empty, contains zeros only).
Figure 9.4 Thirty chromatographic columns are grouped according to their various polarity metrics with hierarchical cluster analysis using Ward's method (as the linkage rule) and the Euclidean distance. Arbitrary horizontal lines at about 20 or 10 distance units define two or three clusters, respectively.
Figure 9.5 Self-organizing map of a dataset of Italian olive oils and its comparison with the map of Italy and the regions of origin of the olive oil samples. (Copyright 1994, with permission from Elsevier.)
Figure 9.6 LDA plots with confidence ellipsoids and separating lines. Note that the line separating two groups goes by definition through the intersections of the two ellipsoids by definition.
Figure 9.7 Transition between the three periods as defined by the coins' metal content.
Figure 9.8 A simple example of a classification tree. Each junction corresponds to a binary decision based on the value of a variable, while each leaf is a possible outcome of the classification. (Note that not necessarily just one leaf can classify the sample into a given group.)
Figure 9.9 Regularization parameter space of a support vector machine classification model. Color coding corresponds to the classification performance (the higher the better). Many combinations may produce the same model goodness.
Figure 9.10 An example of
n
-class ROC curves. Grey curves correspond to individual classes (1–3), while black is the average curve calculated with the Hanley formula. Dashed lines indicate ±1 standard deviation from the average.
Figure 9.11 Schematic representation of PLS regression.
Figure 9.12 Canonical correlation analysis distinguishes five classes (five plant variants b) from real NIR spectra (a). However, a randomization test clearly shows that five arbitrary classes (d) can be found also for random vectors (c). In the present case, canonical modeling could not pass the randomization test, and a serious decrease in the number of included variables (wavelengths) is necessary.
Figure 9.13 The scheme of repeated double cross-validation.
Chapter 10: Chemoinformatics in Food Science
Figure 10.1 Common structure representations of chemical structures to analyze chemical space. The representation of lipoic acid is used as an example.
Figure 10.2 Generative topographic mapping (GTM) visualization of the chemical space of 1477 generally recognized as safe (GRAS) compounds (blue), 2133 Everything Added to Food in the United States (EAFUS) (green), 1798 approved drugs from DrugBank (red), and 549 compounds tested as DNMT1 inhibitors (black). Molecules are represented using MACCS keys fingerprints (166-bits). The Figure was generated using compound databases prepared by Mariana González-Medina.
Figure 10.3 Schematic fingerprint-based representation of flavor descriptors. The Figure shows five representative flavor descriptors. The descriptors that are commonly obtained from sensory analysis can be encoded in a binary fingerprint. Molecules contained in foods such as humulene and lactose found in beer and milk, respectively, could be related to the flavor descriptors.
Figure 10.4 Example of a typical flavor cliff: pair of compounds with high structure similarity but very different flavor.
Figure 10.5 Chemical structures of food-related chemicals identified from computational-driven approaches mentioned in Section 10.6.
Figure 10.6 Chemical structures of compounds associated with inhibition of histone deacetylases and discussed in Section 10.6.2.1.
Figure 10.7 Box plots of six physicochemical properties of food-related chemical databases (GRAS and EAFUS), approved drugs (DrugBank), and inhibitors of DNA methyltransferases (DNMTs).
Figure 10.8 Top five most frequent molecular scaffolds identified in 1477 generally recognized as safe (GRAS) compounds and 2133 Everything Added to Food in the United States (EAFUS) chemicals. For reference, the most populated scaffolds in 1798 approved drugs (from DrugBank), and 549 compounds tested as inhibitors of DNA methyltransferase 1 are shown. For each scaffold, the frequency (and percentage) is indicated above the structure diagram.
Chapter 11: Computational Approaches to Cosmetics Products Discovery
Figure 11.1 Molecule structure of ectoine with the two mesomeric forms (a) and its hydrophilic surface colored according to the corresponding atomic partial charges (b).
Figure 11.2 Stick representation of atoms in the ectoine–water sphere. The gray-colored area of the ectoine–water cluster is presented at a higher resolution to illustrate the molecular composition of the cluster. The small picture corresponds to Figure 11.3, B1.
Figure 11.3 Molecular dynamics simulation of different models containing (A) water, (B) water and ectoine, and (C) water and glycerol. The pictures are taken at the beginning of the simulation (
t
= 0, A1, B1, C1) and after 200 (A2), 1000 (B2) and 500 ps (C2) at a constant temperature of 370 K. Water clusters around ectoine molecules remain stable for a long period of time, whereas the cluster of water and glycerol breaks down and water molecules diffuse out of the spheres. The pictures represent the number of water molecules counted during the dynamic simulation as shown in Table 11.1. The solutes are colored in green.
Figure 11.4 Cyclopeptide-5 consists of five amino acids. RGD, which is the one letter code for arginine (R), glycine (G), and aspartate (D) sequence. d-Phe represents d-phenylalanine and ACHA is the short name for aminocyclohexane carboxylic acid.
Figure 11.5 Cartoon of the extracellular part (Ectodomain) of αvβ3 integrin crystal structure in the complex with (a) fibronectin with RGD loop in cyan circle and (b) cyclopeptide-5 as a mimic of the RGD loop in fibronectin.
Figure 11.6 The canonical map of cell adhesion and ECM remodeling. Effects of cyclopeptide-5 treatment (0.5 μM) on gene expression in skin cells are visualized on the map as thermometer-like Figure Upward thermometers with red color indicate up-regulation and downward (blue) ones indicate down-regulation.
Chapter 12: Applications in Materials Science
Figure 12.1 Structural diversity of the nanoworld: zero-dimensional (point), one-dimensional (linear), fractal, two-dimensional, and three-dimensional nanoparticle fragments.
Figure 12.2 Measured and predicted Hela cell uptake (×10
−11
g/cell) of nanoparticles with dual ligands on their surfaces. (a) The linear model and (b) the nonlinear model. The training set is denoted by circles, and the test set by triangles.
Figure 12.3 Plots of measured versus predicted
T
g
value for a dataset consisting of 275 random copolymers in experiment 1 where the training error tolerance (TET) is set at 60 K and experiment 2 where TET is 30 K.
Figure 12.4 Predicted turnover numbers for 60,000 virtual cross-coupling reactions are plotted versus the first two PCs calculated for all the reaction descriptors. The first PC is correlated mainly with the Pd loading and the electronic descriptors of the organic residue on the alkene, while the second represents the ligand's electronic descriptors.
Figure 12.5 Predicted (QSPR) versus actual (derived from Monte Carlo simulations) for methane storage capacities during cross-validation of the 10,000 MOFs in the training set at 35 bar (a) and 100 bar (b).
Figure 12.6 Predicted (QSPR) versus actual (derived from Monte Carlo simulations) methane storage capacities for the 127,953 MOFs in the test set using the SVM models at 35 bar (a) and 100 bar (b).
Figure 12.7 Spotting of mixed comonomers onto a slide. After polymerization by UV light, the tiny polymer spots are sterilized and exposed to cells or bacteria in culture and the degree of attachment, and growth recorded after specific time intervals.
Figure 12.8 The predicted and measured adhesion of embryoid bodies to polymers in a library of acrylates. The performance of the model in predicting (a) the training set and (b) the test set.
Figure 12.9 Attachment of pathogenic
P. aeruginosa
to the polyacrylate library.
Figure 12.10 The performance of the three QSPR models for
P. aeruginosa
(a),
S. aureus
(b), and UPEC (c). The prediction of the attachment to polymers in the training set are shown in black circles, while test set predictions are in gray triangles. The attachments are on a log scale.
Figure 12.11 Performance of one of the cell division symmetry markers H2AFZ found by sparse chemoinformatics feature selection. Panels show cell nuclei labeled with DAPI (4′,6-diamidino-2-phenylindole, a fluorescent stain that binds strongly to the cell nucleus) in cells dividing symmetrically (a) and asymmetrically (b). Panels (c) and (d) show the same cells labeled with antibody to H2AFZ expression. In the asymmetric cell division case, only one cell is visible (the stem cell).
Chapter 13: Process Control and Soft Sensors
Figure 13.1 Basic concept of a soft sensor.
Figure 13.2 Detection of abnormal values of an analyzer using a soft sensor.
Figure 13.3 Flow of soft sensor analysis and problems involved at each stage.
Figure 13.4 Basic concepts of the degradation of a linear soft sensor model [10].
Figure 13.5 Case study for the EOSVR method.
Figure 13.6 Comparison of OSVR and EOSVR: time plots of the denitration outlet NH
3
.
Figure 13.7 Case study for database management: a distillation column.
Figure 13.8 Comparison of a process without database management (a) and with database management (b): time plots of measured and predicted
y
(MW).
Figure 13.9 Basic concepts of an inverse analysis of a soft sensor model.