Cover Page

Chemometrics

Data Driven Extraction for Science

 

Richard G. Brereton

University of Bristol (Emeritus)
UK

 

Second Edition

 

 

 

 

Wiley Logo

Preface to Second Edition

The first edition of this book has been well received, with a special emphasis on numerical illustration of a wide range of chemometric methods. Of particular importance were the problems at the end of each chapter that readers could work through in their own favourite environment, such as Excel or Matlab, but also R or Python or Fortran or any number of languages or computational packages if desired. I have performed calculations in both Matlab and Excel, but readers should not feel restricted if they have an alternative.

The reader of this book is likely to be an applied scientist or statistician who wishes to understand the basis and motivation of many of the main methods used in chemometrics.

Since the first edition, chemometrics has become much more widespread, including outside mainstream chemistry. In the early 2000s, the major applications were quantitative laboratory analytical science and chemical engineering including process control. Over the past few years, application areas have broadened, as large analytical laboratory-generated data sets become more widely available, for example, in metabolomics, heritage science and food science, reflecting a larger emphasis on pattern recognition in the second edition including some practical case studies from metabolomics in the form of worked problem sets.

Despite this, many of the original building blocks of the subject remain unchanged. A factorial design and a principal component is still the same, so parts of the text only involve small changes from the first edition. Nevertheless, feedback both from students and co-workers of mine and also from comments via the Internet have provided valuable guidance as to what changes are desirable for a second edition. Important structural changes such as multiple choice questions throughout the book and colour printing update the original edition as a modern day textbook.

Some major updates are as follows.

To supplement this book, all data sets in this book, both from the main text and the problems at the end of each chapter, are downloadable. In addition, there is a downloadable Excel add-in to perform most of the common multivariate methods and a macro for labelling graphs. Matlab routines corresponding to many of the main methods are also available. The answers to the problems at the end of each chapter can also be found. These are available on the Wiley website associated with this book.

It is hoped that this text will be useful for students wishing to obtain a fundamental understanding of many chemometric methods. It will also be useful for any practicing chemometrician who needs to work through methods they may have only recently encountered, using numerical examples: as a researcher, when I encounter an unfamiliar approach, I usually like to reproduce numerical data from published case studies to check how it works before I am confident to use the method. For people encountering chemometrics for the first time, for example, in metabolomics and heritage science, this book presents many of the most widespread methods and so will serve as a good reference. And as a refresher, the multiple choice questions test the basic understanding. The worked case studies can be collected together and are helpful for courses.

Finally, I thank the publishers who have encouraged the development of this rather complex project, especially Jenny Cossham, through many stages and also colleagues who have provided data as listed in the acknowledgements.

Bristol, May 2017

Richard G. Brereton

Preface to First Edition

This book is a product of several years' activities from myself. First and foremost, the task of educating graduate students in my research group from a large variety of backgrounds over the past 10 years has been a significant formative experience, and this has allowed me to develop a large series of problems which we set every 3 weeks and present answers in seminars. From my experience, this is the best way to learn chemometrics! In addition, I have had the privilege to organise international quality courses mainly for industrialists with the participation of many representatives as tutors of the best organisations and institutes around the world, and I have learnt from them. Different approaches are normally taken while teaching industrialists who may be encountering chemometrics for the first time in mid-career and have a limited period of a few days to attend a condensed course, and university students that have several months or even years to practice and improve. However, it is hoped that this book represents a symbiosis of both needs.

In addition, it has been a great inspiration for me to write a regular fortnightly column for Chemweb (available to all registered users on www.chemweb.com) and some of the material in this book is based on articles first available in this format. Chemweb brings a large reader base to chemometrics, and feedback via e-mails or even travels around the world have helped me formulate my ideas. There is a very wide interest in this subject, but it is somewhat fragmented. For example, there is a strong group of Near Infrared Spectroscopists, primarily in the USA, that has led to the application of advanced ideas in process monitoring who see chemometrics as a quite technical industrially oriented subject. There are other groups of mainstream chemists that see chemometrics as applicable to almost all branches of research, ranging from kinetics to titrations to synthesis optimisation. Satisfying all these diverse people is not an easy task.

This book relies mainly on numerical examples: many in the body of the text come from my favourite research interests that are primarily in analytical chromatography and spectroscopy, to expand the text more to produce a huge book of twice the size, so I ask the indulgence of readers if your area of application differs. Certain chapters such as those on calibration could be approached from widely different viewpoints, but the methodological principles are the most important, and if you understand how the ideas can be applied in one area, you will be able to translate to your own favourite application. In the problems at the end of each chapter, I cover a wider range of applications to illustrate the broad basis of these methods. The emphasis of this book is on understanding ideas, which can then be applied to a wide variety of problems in chemistry, chemical engineering and allied disciplines.

It is difficult to select what material to include in this book without making it too long. Every expert I have shown this book to has made suggestions for new material. Some I have taken into account and I am most grateful for every proposal, and others I have mentioned briefly or not at all, mainly for the reason of length and also to ensure that this book sees the light of day rather than constantly expands without an end. There are many outstanding specialist books for the enthusiast. It is my experience, although, that if you understand the main principles (which are quite a few in number), and constantly apply them to a variety of problems, you will soon pick up the more advanced techniques, so it is the building blocks that are most important.

In a book of this nature, it is very difficult to decide on what detail is required for the various algorithms, some readers will have no real interest in the algorithms, whereas others will feel the text is incomplete without comprehensive descriptions. The main algorithms for common chemometric methods are presented in Appendix A.2. Step by step descriptions of methods, rather than algorithms, are presented in the text. A few approaches that will interest some readers such as cross-validation in PLS are described in the problems at the end of appropriate chapters which supplement the text. It is expected that readers will approach this book with different levels of knowledge and expectations, so it is possible to gain a great deal without having an in-depth appreciation of computational algorithms, but for interested readers, the information is nevertheless available. People rarely read texts in a linear fashion, they often dip in and out of parts of it according to their background and aspirations, and chemometrics is a subject which people approach with very different previous knowledge and skills, so it is possible to gain from this book without covering every topic in full. Many readers will simply use add-ins or Matlab commands and be able to produce all the results in this text.

Chemometrics uses a very large variety of software. In this book, we recommend two main environments, Excel and Matlab, the examples have been tried using both environments, and you should be able to get the same answers in both cases. Users of this book will vary from people that simply want to plug the data into existing packages to those that are curious and want to reproduce the methods in their own favourite language such as Matlab, VBA or even C. In some cases, instructors may use the information available with this book to tailor examples for problem classes. Extra software supplements are available via the publishers' website www.SpectroscopyNOW.com, together with all the data sets in this book.

The problems at the end of each chapter form an important part of the text, the examples being a mixture of simulations (which have an important role in chemometrics) and real case studies from a wide variety of sources. For each problem, the relevant sections of the text that provide further information are referenced. However, a few problems build on the existing material and take the reader further: a good chemometrician should be able to use the basic building blocks to understand and use new methods. The problems are of various types; thus, not every reader will to solve all the problems. In addition, instructors can use the data sets to construct workshops or course material that goes further than the book.

I am very grateful for the tremendous support I have had from many people when asking for information and help with data sets and permission where required. I thank Chemweb for agreement to present material modified from articles originally published in their e-zine, The Alchemist, and the RSC for permission to base the text of Chapter 5 on material originally published in the Analyst (125, 2125–2154 (2000)). A full list of acknowledgements for the data sets used in this text is presented after this foreword.

I thank Tom Thurston and Les Erskine for a superb job on the Excel add-in, and Hailin Shen for outstanding help in Matlab. Numerous people have tested the answers to the problems. Special mention should be given to Christian Airiau, Kostas Zissis, Tom Thurston, Conrad Bessant and Cevdet Demir for access to a comprehensive set of answers on disc for a large number of exercises so I can check mine. In addition, several people have read chapters and made detailed comments particularly checking numerical examples; in particular, I thank Hailin Shen for suggestions about improving Chapter 6 and Mohammed Wasim for careful checking of errors. In some ways, the best critics are the students and postdocs working with me because they are the people that have to read and understand a book of this nature, and it gives me great confidence that my co-workers in Bristol have found this approach useful and have been able to learn from the examples.

Finally, I thank the publishers for taking a germ on an idea and making valuable suggestions as to how this could be expanded and improved to produce what I hope is a successful textbook and having faith and patience over a protracted period.

Bristol, February 2002

Richard G. Brereton