Cover Page

Understanding statistical error

A primer for biologists



Marek Gierliński

University of Dundee









Wiley Logo





Errors, like straws, upon the surface flow;
He who would search for pearls must dive below

                    —John Dryden (1631–1700)

Introduction

Why would you read an introduction?

It is common that each nonfiction book is preceded by an ‘introduction’, or a ‘preface’, or a ‘foreword’ or sometimes a combination of the above. If you are (un)lucky, you might find a note from the Editor, a foreword followed by the preface to the first edition, a preface to the second edition and a general introduction. There, first of all, you can read about how great the author is. Next, you will find that the book is unique and better than all other books on the topic written so far. Then, the author will delve into painstakingly detailed description of each chapter, which by the way can be found in the table of contents. Finally, there is time for compulsory acknowledgements to all family and friends who the author forced into reading his or her magnum opus. There is no escaping; forewords, prefaces and introductions are everywhere. Stanisław Lem once wrote a book consisting entirely of forewords (Lem 1979).

People usually skip all of these intros as they are boring, pretentious, self-righteous and useless. All right, are you still with me? If you managed to get that far, you might be one of the few who actually read introductions. Very well, then. I'll try to be brief, down to the point and not too conceited.

What is this book about?

As the title suggests, the book is about error analysis, with emphasis on applications in biology or, more generally, in life sciences. Since the time of the great Ronald Fisher, statistics have become an inherent part of biology. Very few numerical results from either biological or medical studies can make their way into publication without confirming their statistical significance. One way of doing this is by providing a p-value from a statistical test, or – roughly speaking – a probability of being wrong in a particular statement. That is what this book is not about.

The other way of assessing the significance of a result is by finding its inherent error, or uncertainty. In my mind, a numerical result quoted without any kind of uncertainty is meaningless. Hence, it is good to know how to calculate errors. And that is what the book is about.

Here I discuss various aspects of error analysis: a bit of theoretical background and practical ways of calculating confidence intervals, but also graphical presentation of error bars and quoting numbers with errors. I put emphasis on intuition and understanding rather than practical computational recipes, although I give exact formulae for types of errors. Beware: this is not a comprehensive book on statistics; it is rather focused on practical understanding of uncertainty analysis. You can find more details in the table of contents, right after the introduction.

Who is this book for?

This book is written for an inquisitive biologist who wants to improve his or her understanding of data analysis. While a biologist is my target reader, the book may be useful for anyone who deals with numerical data and wants to learn more about how to evaluate and compare measurements. If you calculate various types of errors using a software package and you would like to find out where these errors come from, this book is for you. If you use standard deviations, standard errors and confidence intervals, but you are not sure what they really mean, this book is for you. If you struggle with finding errors of the median or correlation coefficient, this book is for you. Or, perhaps you are just curious and would like to learn a few basic things about uncertainty analysis – this book is also for you.

About maths

Despite the existence of a few attempts in the literature that use a purely intuitive approach (e.g. Motulsky 2010), I believe that it is very difficult to do statistics without maths. Plain English explanations cannot replace the strict precision of a mathematical equation. A simple derivation can explain where a given formula came from. Hence, there is maths in this book. Not very complex, not very extensive, but maths there is.

Needless to say, equations are required in practical applications, so if you need to find a particular uncertainty not provided by the statistical software you normally use, you can employ equations from this book. They can be easily encoded, either in any programming language or even in a computer spreadsheet. Mathematics in this book is quite basic; it doesn't really go beyond the level taught in a typical secondary school. Most equations contain simple algebra and sums. The most advanced operator I use is a derivative.

I don't want to scare potential readers away. This is not a mathematical textbook! I apply equations only when necessary and I always try to accompany them with an intuitive explanation. Often, I show the results of a computer simulation to illustrate the meaning of a concept or formula. I have also made a few simplifications and approximations here and there at the expense of mathematical correctness. I hope this makes the maths in this book much easier to understand.

I need to finish with a caveat. This is a book written primarily for biologists, not for mathematicians or physicists. Hence, there are no mathematical proofs, some derivations are not strict and there is a general lack of mathematical rigour. A mathematician might scowl at the content of this book, so if you are one, please shut your eyes now.

Acknowledgements

I would like to thank Professor Angus Lamond, who carefully read the manuscript from cover to cover and gave me a great deal of invaluable comments. Being a biologist, he helped me to understand better my target reader (you!). He also helped me with my English, which is not my first language.