Copyright © 2017 by John Wiley & Sons, Inc. All rights reserved
Published by John Wiley & Sons, Inc., Hoboken, New Jersey
Published simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging-in-Publication Data:
Names: Kottemann, Jeffrey E.
Title: Illuminating statistical analysis using scenarios and simulations/Jeffrey E Kottemann, Ph.D.
Description: Hoboken, New Jersey: John Wiley & Sons, Inc. [2017], | Includes index.
Identifiers: LCCN 2016042825| ISBN 9781119296331 (cloth) | ISBN 9781119296362 (epub)
Subjects: LCSH: Mathematical statistics. | Distribution (Probability theory)
Classification: LCC QA276 .K676 2017 | DDC 519.5—dc23 LC record available at https://lccn.loc.gov/2016042825
The goal of this book is to help people develop an assortment of key intuitions about statistics and inference and use those intuitions to make sense of statistical analysis methods in a conceptual as well as a practical way. Moreover, I hope to engender good ways of thinking about uncertainty. The book is comprised of a series of short, concise chapters that build upon each other and are best read in order. The chapters cover a wide range of concepts and methods of classical (frequentist) statistics and inference. (There are also appendices on Bayesian statistics and on data mining techniques.)
Examining computer simulation results is central to our investigation. Simulating pollsters, for example, who survey random people for responses to an agree or disagree opinion question not only mimics reality but also has the added advantage of being able to employ 1000 independent pollsters simultaneously. The results produced by such simulations provide an eye-opening way to (re)discover the properties of sample statistics, the role of chance, and to (re)invent corresponding principles of statistical inference. The simulation results also foreshadow the various mathematical formulas that underlie statistical analysis.
Mathematics used in the book involves basic algebra. Of particular relevance is interpreting the relationships found in formulas. Take, for example, . As increases, increases because is the numerator of the fraction. And as increases, decreases because is the denominator. Going one step further, we could have . Here, as increases, decreases, increases, so increases. These functional forms mirror most of the statistical formulas we will encounter.
As we will see for a wide range of scenarios, simulation results clearly illustrate the terms and relationships found in the various formulas that underlie statistical analysis methods. They also bring to light the underlying assumptions that those formulas and methods rely upon. Last, but not least, we will see that simulation can serve as a robust statistical analysis method in its own right.
Bon voyage
Jeffrey E. Kottemann
My thanks go to Dan Dolk, Gene Hahn, Fati Salimian, and Kathie Wright for their feedback and encouragement. At John Wiley & Sons, thanks go to Susanne Steitz-Filler, Kathleen Pagliaro, Vishnu Narayanan, and Shikha Pahuja.
Before we focus in on using statistics as evidence to be used in making judgments, let's take a look at a widely used “verdict outcomes framework.” This general framework is useful for framing judgments in a wide range of situations, including those encountered in statistical analysis.
Anytime we use evidence to arrive at a judgment, there are four generic outcomes possible, as shown in Table 1.1. Two outcomes correspond to correct judgments and two correspond to incorrect judgments, although we rarely know whether our judgments are correct or incorrect. Consider a jury trial in U.S. criminal court. Ideally, the jury is always correct, judging innocent defendants not guilty and judging guilty defendants guilty. Evidence is never perfect, though, and so juries will make erroneous judgments, judging innocent defendants guilty or guilty defendants not guilty.
Table 1.1 Verdict outcomes framework.
In U.S. criminal court, the presumption is that a defendant is innocent until “proven” guilty. Further, convention in U.S. criminal court has it that we are more afraid of punishing an innocent person (type I error) than we are of letting a guilty person go unpunished (type II error). Because of this fear, the threshold for a guilty verdict is set high: “Beyond a reasonable doubt.” So, convicting an innocent person should be a relatively unlikely outcome. In U.S. criminal court, we are willing to have a greater chance of letting a guilty person go unpunished than we are of punishing an innocent person. In short, we need to be very sure before we reject the presumption of innocence and render a verdict of guilty in U.S. criminal court.
We can change the relative chances of the two types of error by changing the threshold. Say we change from “beyond a reasonable doubt” to “a preponderance of evidence.” The former is the threshold used in U.S. criminal court, and the latter is the threshold used in U.S. civil court. Let's say that the former corresponds to being 95% sure before judging a defendant guilty and that the latter corresponds to being 51% sure before judging a defendant guilty. You can imagine cases where the same evidence results in different verdicts in criminal and civil court, which indeed does happen. For example, say that the evidence leads to the jury being 60% sure of the defendant's guilt. The jury verdict in criminal court would be not guilty (60% < 95%) but the jury verdict in civil court would be guilty (60% > 51%). Compared to criminal court, civil court is more likely to declare an innocent person guilty (type I error), but is also less likely to declare a guilty person not guilty (type II error).
Statistical analysis is conducted as if in criminal court. Below are a number of jury guidelines that have parallels in statistical analysis, as we'll see repeatedly.
Statistical analysis formally evaluates evidence in order to determine whether to reject or not reject a stated presumption, and it is primarily concerned with limiting the chances of type I error. Further, the amount of evidence and the variance of evidence are key characteristics of evidence that are formally incorporated into the evaluation process. In what follows, we'll see how this is accomplished.
Let's start with the simplest statistical situation: that of judging whether a coin is fair or not fair. Later we'll see that this situation is statistically equivalent to agree or disagree opinion polling. A coin is fair if it has a 50% chance of coming up heads, and a 50% chance of coming up tails when you flip it. Adjusting the verdict table to the coin-flipping context gives us Table 2.1.
Table 2.1 Coin flipping outcomes.
Intuitively, it seems extremely unlikely for a fair coin to come up heads only 0 or 1 times out of 10, and most people would arrive at the verdict that the coin is not fair. Likewise, it seems extremely unlikely for a fair coin to come up heads 9 or 10 times out of 10, and most people would arrive at the verdict that the coin is not fair. On the other hand, it seems fairly likely for a fair coin to come up heads 4, 5, or 6 times out of 10, and so most people would say that the coin seems fair. But what about 2, 3, 7, or 8 heads? Let's experiment.
Shown in Figure 2.1 is a histogram of what actually happened (in simulation) when 1000 people each flipped a fair coin 10 times. This shows us how fair coins tend to behave. The horizontal axis is the number of heads that came up out of 10. The vertical axis shows the number of people out of the 1000 who came up with the various numbers of heads.
Appendix B gives step-by-step instructions for constructing this simulation using common spreadsheet software; guidelines are also given for constructing additional simulations found in the book.
Sure enough, fair coins very rarely came up heads 0, 1, 9, or 10 times. And, sure enough, they very often came up heads 4, 5, or 6 times. What about 2, 3, 7, or 8 heads?
Notice that 2 heads came up a little less than 50 times out of 1000, or near 5% of the time. Same with 8 heads. And, 3 heads came up well over 100 times out of 1000, or over 10% of the time. Same with 7 heads.
Before expanding the previous Statistical Scenario let's briefly explore why the histogram, reproduced in Figure 3.1, is shaped the way it is: bell-shaped. It tapers off symmetrically on each side from a single peak in the middle.
Since each coin flip has two possible outcomes and we are considering ten separate outcomes together, there are a total of unique possible patterns (permutations) of heads and tails with 10 flips of a coin. Of these, there is only one with 0 heads and only one with 10 heads. These are the least likely outcomes.
TTTTTTTTTT | HHHHHHHHHH |
There are ten with 1 head, and ten with 9 heads:
HTTTTTTTTT | THHHHHHHHH |
THTTTTTTTT | HTHHHHHHHH |
TTHTTTTTTT | HHTHHHHHHH |
TTTHTTTTTT | HHHTHHHHHH |
TTTTHTTTTT | HHHHTHHHHH |
TTTTTHTTTT | HHHHHTHHHH |
TTTTTTHTTT | HHHHHHTHHH |
TTTTTTTHTT | HHHHHHHTHH |
TTTTTTTTHT | HHHHHHHHTH |
TTTTTTTTTH | HHHHHHHHHT |
Since there are 10 times more ways to get 1 or 9 heads than 0 or 10 heads, we expect to flip 1 or 9 heads 10 times more often than 0 or 10 heads.
Further, there are 45 ways to get 2 or 8 heads, 120 ways to get 3 or 7 heads, and 210 ways to get 4 or 6 heads. Finally, there are 252 ways to get 5 heads, which is the most likely outcome and therefore the most frequently expected outcome. Notice how the shape of the histogram of simulation outcomes we saw in Figure 3.1 closely mirrors the number of ways (#Ways) chart that is shown in Figure 3.2.
You don't need to worry about calculating #ways. Soon we won't need such calculations. Just for the record, the formula for the #ways is where is the number of flips, h is the number of heads you are interested in, and ! is the factorial operation (example: ). In official terms, #ways is the number of combinations of things taken at a time.
Let's revisit Statistical Scenario–Coins #1, now with additional information on each of the possible outcomes. Table 4.1 summarizes this additional information. As noted, there are a total of different unique patterns of heads & tails possible when we flip a coin 10 times. For any given number of heads, as we have just seen, there are one or more ways to get that number of heads.
Table 4.1 Coin flipping details.
#Heads | #Ways | Expected relative frequency | Probability | as Percent | Rounded |
0 | 1 | 1/1024 | 0.00098 | 0.098% | 0.1% |
1 | 10 | 10/1024 | 0.00977 | 0.977% | 1.0% |
2 | 45 | 45/1024 | 0.04395 | 4.395% | 4.4% |
3 | 120 | 120/1024 | 0.11719 | 11.719% | 11.7% |
4 | 210 | 210/1024 | 0.20508 | 20.508% | 20.5% |
5 | 252 | 252/1024 | 0.24609 | 24.609% | 24.6% |
6 | 210 | 210/1024 | 0.20508 | 20.508% | 20.5% |
7 | 120 | 120/1024 | 0.11719 | 11.719% | 11.7% |
8 | 45 | 45/1024 | 0.04395 | 4.395% | 4.4% |
9 | 10 | 10/1024 | 0.00977 | 0.977% | 1.0% |
10 | 1 | 1/1024 | 0.00098 | 0.098% | 0.1% |
Totals: | 1024 | 1024/1024 | 1.0 | 100% | 100% |
The #ways divided by 1024 gives us the expected relative frequency for that number of heads expressed as a fraction. For example, we expect to get 5 heads 252/1024ths of the time. The fraction can also be expressed as a decimal value. This decimal value can be viewed as the probability that a certain number of heads will come up in 10 flips. For example, the probability of getting 5 heads is approximately 0.246. We can also express this as a percentage, 24.6%.
A probability of 1 (100%) means something will always happen and a probability of 0 (0%) means something will never happen. A probability of 0.5 (50%) means something will happen half the time. The sum of the probabilities of the entire set of possible outcomes is the sum of all the probabilities and always equals 1 (100%). The probability of a subset of possible outcomes can be calculated by summing the probabilities of each of the outcomes. For example, using the rounded percentages from the table, the probability of 2 or fewer heads is .
Notice how the bars of our simulation histogram, reproduced in Figure 4.1, reflect the corresponding probabilities in Table 4.1.
Say someone gives you a coin to test. When you flip the coin 10 times, you are sampling the coin's behavior 10 times. The number of heads you toss is your evidence. Based on this evidence you must decide whether to reject your presumption of fairness and judge the coin as not fair.
What happens if you make your “verdict rule” to be:
Verdict “coin is not fair” if outside the interval #heads ≥ 1 and ≤ 9 as shown in Table 4.2 and the accompanying Figure 4.2?
From the Statistical Scenario Table 4.1, we can see that a fair coin will come up 0 heads about 0.1% of the time, and 10 heads about 0.1% of the time. The sum is about 0.2% of the time, or about 2 out of 1000. So, it will be extremely rare for us to make a type I error and erroneously call a fair coin unfair because fair coins will almost never come up with 0 or 10 heads. However, what about 1 head or 9 heads? Our rule says not to call those coins unfair. But a fair coin will only come up 1 head or 9 heads about of the time. Therefore, we may end up misjudging many unfair coins that come up heads one or nine times because we'll declare them to be fair coins. That is type II error.
Table 4.2 First verdict rule scenario.
Determining the chance of type II error is too involved for discussion now (that is Chapter 17), but recall from Chapter 1 that increasing the chance of type I error decreases the chance of type II error, and vice versa.
To lower the chances of type II error, we can narrow our “verdict rule” interval to #heads ≥ 2 and ≤ 8 as shown in Table 4.3 and Figure 4.3. Now the probability of making a type I error is about . This rule will decrease the chances of type II error, while increasing the chances of type I error from 0.2 to 2.2%.
If we narrow our “verdict rule” interval even more to #heads ≥ 3 and ≤ 7, we get Table 4.4 and Figure 4.4.
Table 4.3 Second verdict rule scenario.
Now the probability of making a type I error is about because a fair coin will come up 0, 1, 2, 8, 9, or 10 heads about 11% of the time. We can express this uncertainty by saying either that there will be an 11% chance of a type I error, or that we are 89% confident that there will not be a type I error. Notice that this is what we came up with earlier by simply eyeballing the histogram of actual simulation outcomes in Chapter 2.
Table 4.4 Third verdict rule scenario.
As noted earlier, type I error is feared most. And an 11% chance of type I error is usually seen as excessive. So, we can adopt this rule:
This gives us about a 2% chance of type I error.
From now on, we'll typically use the following threshold levels for type I error: 10% (0.10), 5% (0.05), and 1% (0.01). We'll see the effects of using various thresholds as we go along. Also as we go along we'll need to replace some common words with statistical terminology. Below are statistical terms to replace the common words we have been using.
It is important to emphasize that simulation histograms represent sampling distributions that tell us what to expect when the null hypothesis is true. We'll look at many, many sampling distribution histograms in this book. For the remainder of Part I, we'll switch from using counts as our sample statistic to using proportions as our sample statistic. The sampling distribution histogram in Figure 4.5 shows the use of proportions rather than counts on the horizontal axis.