Contents
Tables
Boxes
Figures
Getting Files from the Wiley ftp and Internet Sites
LIST OF DATA SITES PROVIDED ON WEB SITE
Preface to the Fourth Edition
PART 1: Basic Concepts
CHAPTER 1: Uses of Sample Surveys
1.1 WHY SAMPLE SURVEYS ARE USED
1.2 DESIGNING SAMPLE SURVEYS
1.3 PRELIMINARY PLANNING OF A SAMPLE SURVEY
EXERCISES
BIBLIOGRAPHY
CHAPTER 2: The Population and the Sample
2.1 THE POPULATION
2.2 THE SAMPLE
2.3 SAMPLING DISTRIBUTIONS
2.4 CHARACTERISTICS OF ESTIMATES OF POPULATION PARAMETERS
2.5 CRITERIA FOR A GOOD SAMPLE DESIGN
2.6 SUMMARY
EXERCISES
BIBLIOGRAPHY
PART 2: Major Sampling Designs and Estimation Procedures
CHAPTER 3: Simple Random Sampling
3.1 WHAT IS A SIMPLE RANDOM SAMPLE?
3.2 ESTIMATION OF POPULATION CHARACTERISTICS UNDER SIMPLE RANDOM SAMPLING
3.3 SAMPLING DISTRIBUTIONS OF ESTIMATED POPULATION CHARACTERISTICS
3.4 COEFFICIENTS OF VARIATION OF ESTIMATED POPULATION PARAMETERS
3.5 RELIABILITY OF ESTIMATES
3.6 ESTIMATION OF PARAMETERS FOR SUBDOMAINS
3.7 HOW LARGE A SAMPLE DO WE NEED?
3.8 WHY SIMPLE RANDOM SAMPLING IS RARELY USED
3.9 SUMMARY
EXERCISES
BIBLIOGRAPHY
CHAPTER 4: Systematic Sampling
4.1 HOW TO TAKE A SYSTEMATIC SAMPLE
4.2 ESTIMATION OF POPULATION CHARACTERISTICS
4.3 SAMPLING DISTRIBUTION OF ESTIMATES
4.4 VARIANCE OF ESTIMATES
4.5 A MODIFICATION THAT ALWAYS YIELDS UNBIASED ESTIMATES
4.6 ESTIMATION OF VARIANCES
4.7 REPEATED SYSTEMATIC SAMPLING
4.8 HOW LARGE A SAMPLE DO WE NEED?
4.9 USING FRAMES THAT ARE NOT LISTS
4.10 SUMMARY
EXERCISES
BIBLIOGRAPHY
CHAPTER 5: Stratification and Stratified Random Sampling
5.1 WHAT IS A STRATIFIED RANDOM SAMPLE?
5.2 HOW TO TAKE A STRATIFIED RANDOM SAMPLE
5.3 WHY STRATIFIED SAMPLING?
5.4 POPULATION PARAMETERS FOR STRATA
5.5 SAMPLE STATISTICS FOR STRATA
5.6 ESTIMATION OF POPULATION PARAMETERS FROM STRATIFIED RANDOM SAMPLING
5.7 SUMMARY
EXERCISES
BIBLIOGRAPHY
CHAPTER 6: Stratified Random Sampling: Further Issues
6.1 ESTIMATION OF POPULATION PARAMETERS
6.2 SAMPLING DISTRIBUTIONS OF ESTIMATES
6.3 ESTIMATION OF STANDARD ERRORS
6.4 ESTIMATION OF CHARACTERISTICS OF SUBGROUPS
6.5 ALLOCATION OF SAMPLE TO STRATA
6.6 STRATIFICATION AFTER SAMPLING
6.7 HOW LARGE A SAMPLE IS NEEDED?
6.8 CONSTRUCTION OF STRATUM BOUNDARIES AND DESIRED NUMBER OF STRATA
6.9 SUMMARY
EXERCISES
BIBLIOGRAPHY
CHAPTER 7: Ratio Estimation
7.1 RATIO ESTIMATION UNDER SIMPLE RANDOM SAMPLING
7.2 ESTIMATION OF RATIOS FOR SUBDOMAINS UNDER SIMPLE RANDOM SAMPLING
7.3 POSTSTRATIFIED RATIO ESTIMATES UNDER SIMPLE RANDOM SAMPLING
7.4 RATIO ESTIMATION OF TOTALS UNDER SIMPLE RANDOM SAMPLING
7.5 COMPARISON OF RATIO ESTIMATE WITH SIMPLE INFLATION ESTIMATE
7.6 APPROXIMATION TO THE STANDARD ERROR OF THE RATIO ESTIMATED TOTAL
7.7 DETERMINATION OF SAMPLE SIZE
7.8 REGRESSION ESTIMATION OF TOTALS
7.9 RATIO ESTIMATION IN STRATIFIED RANDOM SAMPLING
7.10 SUMMARY
EXERCISES
BIBLIOGRAPHY
CHAPTER 8: Cluster Sampling: Introduction and Overview
8.1 WHAT IS CLUSTER SAMPLING?
8.2 WHY IS CLUSTER SAMPLING WIDELY USED?
8.3 A DISADVANTAGE OF CLUSTER SAMPLING: HIGH STANDARD ERRORS
8.4 HOW CLUSTER SAMPLING IS TREATED IN THIS BOOK
8.5 SUMMARY
EXERCISES
BIBLIOGRAPHY
CHAPTER 9: Simple One-Stage Cluster Sampling
9.1 HOW TO TAKE A SIMPLE ONE-STAGE CLUSTER SAMPLE
9.2 ESTIMATION OF POPULATION CHARACTERISTICS
9.3 SAMPLING DISTRIBUTIONS OF ESTIMATES
9.4 HOW LARGE A SAMPLE IS NEEDED?
9.5 RELIABILITY OF ESTIMATES AND COSTS INVOLVED
9.6 CHOOSING A SAMPLING DESIGN BASED ON COST AND RELIABILITY
9.7 SUMMARY
EXERCISES
BIBLIOGRAPHY
CHAPTER 10: Two-Stage Cluster Sampling: Clusters Sampled with Equal Probability
10.1 SITUATION IN WHICH ALL CLUSTERS HAVE THE SAME NUMBER Ni OF ENUMERATION UNITS
10.2 SITUATION IN WHICH NOT ALL CLUSTERS HAVE THE SAME NUMBER Ni OF ENUMERATION UNITS
10.3 SYSTEMATIC SAMPLING AS CLUSTER SAMPLING
10.4 SUMMARY
EXERCISES
BIBLIOGRAPHY
CHAPTER 11: Cluster Sampling in Which Clusters Are Sampled with Unequal Probability: Probability Proportional to Size Sampling
11.1 MOTIVATION FOR NOT SAMPLING CLUSTERS WITH EQUAL PROBABILITY
11.2 TWO GENERAL CLASSES OF ESTIMATORS VALID FOR SAMPLE DESIGNS IN WHICH UNITS ARE SELECTED WITH UNEQUAL PROBABILITY
11.3 PROBABILITY PROPORTIONAL TO SIZE SAMPLING
11.4 FURTHER COMMENT ON PPS SAMPLING
11.5 SUMMARY
EXERCISES
BIBLIOGRAPHY
CHAPTER 12: Variance Estimation in Complex Sample Surveys
12.1 LINEARIZATION
12.2 REPLICATION METHODS
12.3 SUMMARY
EXERCISES
TECHNICAL APPENDIX
BIBLIOGRAPHY
PART 3: Selected Topics in Sample Survey Methodology
CHAPTER 13: Nonresponse and Missing Data in Sample Surveys
13.1 EFFECT OF NONRESPONSE ON ACCURACY OF ESTIMATES
13.2 METHODS OF INCREASING THE RESPONSE RATE IN SAMPLE SURVEYS
13.3 MAIL SURVEYS COMBINED WITH INTERVIEWS OF NONRESPONDENTS
13.4 OTHER USES OF DOUBLE (or TWO-PHASE) SAMPLING METHODOLOGY
13.5 ITEM NONRESPONSE: METHODS OF IMPUTATION
13.6 MULTIPLE IMPUTATION
13.7 SUMMARY
EXERCISES
BIBLIOGRAPHY
CHAPTER 14: Selected Topics in Sample Design and Estimation Methodology
14.1 WORLD HEALTH ORGANIZATION EPI SURVEYS: A MODIFICATION OF PPS SAMPLING FOR USE IN DEVELOPING COUNTRIES
14.2 QUALITY ASSURANCE SAMPLING
14.3 SAMPLE SIZES FOR LONGITUDINAL STUDIES
14.4 ESTIMATION OF PREVALENCE OF DISEASES FROM SCREENING STUDIES
14.5 ESTIMATION OF RARE EVENTS: NETWORK SAMPLING
14.6 ESTIMATION OF RARE EVENTS: DUAL SAMPLES
14.7 ESTIMATION OF CHARACTERISTICS FOR LOCAL AREAS: SYNTHETIC ESTIMATION
14.8 EXTRACTION OF SENSITIVE INFORMATION: RANDOMIZED RESPONSE TECHNIQUES
14.9 SUMMARY
EXERCISES
BIBLIOGRAPHY
CHAPTER 15: Telephone Survey Sampling
15.1 INTRODUCTION
15.2. HISTORY OF TELEPHONE SAMPLING IN THE UNITED STATES
15.3 WITHIN-HOUSEHOLD SELECTION TECHNIQUES
15.4 STEPS IN THE TELEPHONE SURVEY PROCESS
15.5 DRAWING AND MANAGING A TELEPHONE SURVEY SAMPLE
15.6 POST-SURVEY DATA ENHANCEMENT PROCEDURES
15.7 IMPUTATION OF MISSING DATA
15.8 DECLINING COVERAGE AND RESPONSE RATES
15.9 ADDRESSING THE PROBLEMS WITH CELL PHONES
15.10 ADDRESS-BASED SAMPLING
EXERCISES
BIBLIOGRAPHY
CHAPTER 16: Constructing the Survey Weights
16.1 INTRODUCTION
16.2 OBJECTIVES OF WEIGHTING
16.3 CONSTRUCTING THE SAMPLE WEIGHTS
16.4 ESTIMATION AND ANALYSIS ISSUES
16.5 Summary
BIBLIOGRAPHY
CHAPTER 17: Strategies for Design-Based Analysis of Sample Survey Data
17.1 STEPS REQUIRED FOR PERFORMING A DESIGN-BASED ANALYSIS
17.2 ANALYSIS ISSUES FOR “TYPICAL” SAMPLE SURVEYS
17.3 SUMMARY
TECHNICAL APPENDIX
BIBLIOGRAPHY
Appendix
Answers to Selected Exercises
Index
WILEY SERIES IN SURVEY METHODOLOGY
Established in Part by WALTER A. SHEWHART AND SAMUEL S. WILKS
Editors: Robert M. Groves, Graham Kalton, J. N. K. Rao, Norbert Schwarz, Christopher Skinner
A complete list of the titles in this series appears at the end of this volume.
Copyright © 2008 by John Wiley & Sons, Inc. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey.
Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic format. For information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging-in-Publication Data:
Levy, Paul S.
Sampling of populations : methods and applications / Paul S. Levy, Stanley Lemeshow. — 4th ed. p. cm.
Includes bibliographical references and index.
ISBN 978-0-470-04007-2
1. Population—Statistical methods. 2. Sampling (Statistics) I. Lemeshow, Stanley. II. Title.
HB849.49.L48 2008
304.601′51952—dc21
2008004934
To our wives, Virginia and Elaine,
and our sons, daughters, and grandchildren
Tables
2.1 | Number of Household Visits Made During a Specified Year |
2.2 | Sample Data for Number of Household Visits |
2.3 | Data for Number of Students Not Immunized for Measles Among Six Schools in a Community |
2.4 | Possible Samples and Values of x′ |
2.5 | Sampling Distribution for Data of Table 2.4 |
2.6 | Sampling Procedure for Population of Six Schools |
2.7 | Possible Samples and Values of x′ |
2.8 | Sampling Distribution of x′ |
2.9 | Data for the Burn Area Estimates |
3.1 | Possible Samples of Three Schools and Values of x′ |
3.2 | Sampling Distribution of the x′ in Table 3.1 |
3.3 | Values of the fpc (N = 10,000) |
3.4 | Sample Data for Number of Household Visits |
3.5 | Race and Out-of-Pocket Medical Expenses for Six Families (1987) |
3.6 | Data for a Subdomain Based on the Families in Table 3.5 |
3.7 | Sampling Distribution of z/y |
3.8 | Cumulative Exposure to Pulmonary Stressors and Forced Vital Capacity Among Workers in a Sample Taken at a Plant Employing 1200 Workers |
4.1 | Systematic Sample of One in Six Physicians (from Table 2.1) |
4.2 | Five Possible Samples of One in Five Physicians (from Table 2.1) |
4.3 | Comparison of Means and Standard Errors for Simple Random Sampling and for Systematic Sampling |
4.4 | Six Possible Samples of One in Six Physicians (Table 2.1) |
4.5 | Possible Samples of 1 in k Elements (N/k is an Integer) |
4.6 | Cluster Samples Based on Data of Table 3.8 |
4.7 | Data for Nurse Practitioner’s Visits (Unordered List) |
4.8 | Four Possible Samples for Data of Table 4.7 |
4.9 | Data for Nurse Practitioner’s Visits (Monotonically Ordered List) |
4.10 | Four Possible Samples for Data of Table 4.9 |
4.11 | Data for Nurse Practitioner’s Visits (Periodicity in List) |
4.12 | Four Possible Samples for Data of Table 4.11 |
4.13 | Summary of Results for Four Types of Sampling |
4.14 | Distribution of Remainders and Samples for 25 Possible Random Numbers |
4.15 | Sampling Distribution of Estimated Means from Data of Tables 4.14 and 2.1 |
4.16 | Systematic One-in-Five Sample Taken from Table 2.1 |
4.17 | Days Lost from Work Because of Acute Illness in One Year Among 162 Employees in a Plant |
4.18 | Data for Six Systematic Samples Taken from Table 4.17 |
4.19 | Six Samples Taken from Table 4.17 |
5.1 | Truck Miles and Number of Accidents Involving Trucks by Type of Road Segment |
5.2 | Sampling Distribution of x′ for 56 Possible Samples of Three Segments |
5.3 | Two Strata for Data of Table 5.1 |
5.4 | Sampling Distribution of for 30 Possible Samples of Three Segments |
5.5 | Comparison of Results for Simple Random Sampling and Stratification |
5.6 | Strata for a Population of 14 Families |
6.1 | Retail Prices of 10 Capsules of a Tranquilizer in Pharmacies in Two Communities (Strata) |
6.2 | Possible Samples for the Stratified Random Sample |
6.3 | Sampling Distribution for |
6.4 | General Hospitals in Illinois by Geographical Stratum, 2005 |
6.5 | Strata for Number of Hospital Beds by County Among Counties in Illinois (Excluding Cook County) Having General Hospitals |
6.6 | Number of Pairs Available in Each of 18 Sex-Race-Sequence Difference Quantiles |
6.7 | Total Pairs Available and Total Pairs Sampled |
6.8 | Sample Data of Veterinarian’s Survey |
6.9 | Distribution of Hospital Episodes per Person per Year |
6.10 | Frequency Distribution of Total Amount Charged During 1996 to Medicaid for 2387 Patients Treated by a Large Medical Group |
6.11 | Optimal Allocation Based on Use of the rootfreq Method for Construction of Strata |
6.12 | Results of Three Methods of Strata Construction Combined with Optimal Allocation from Data on 2387 Patients Shown in Table 6.10 |
7.1 | Pharmaceutical Expenses and Total Medical Expenses Among All Residents of Eight Community Areas |
7.2 | Possible Samples of Two Elements from the Population of Eight Elements (Table 7.1) |
7.3 | Samples of Seven Elements from the Population of Table 7.1 |
7.4 | Population (2000 Census) and Current School Enrollment by Census Tract |
7.5 | Possible Samples of Two Schools Taken from Table 7.4 |
7.6 | Values of x′ and x″ for Samples in Table 7.5 |
7.7 | Sample Data from Table 5.1 |
8.1 | Some Practical Examples of Clusters |
8.2 | Comparison of Costs for Two Sampling Designs |
9.1 | Number of Persons over 65 Years of Age and Number over 65 Years Needing Services of Visiting Nurse for Five Housing Developments |
9.2 | Summary Data for the Two Clusters Selected in the Sample |
9.3 | Sampling Distributions of Three Estimates |
9.4 | Means and Standard Errors of Estimates |
9.5 | Number of Eligible Persons and Number Receiving Substance Abuse Treatment Among Individuals Sentenced to Probation in 26 District Courts |
10.1 | Number of Patients Seen by Nurse Practitioners and Number Referred to a Physician for Five Community Health Centers |
10.2 | Summary Data for the Three Clusters Selected in the Sample |
10.3 | Worksheet for Calculations Involving Cluster Totals |
10.4 | Worksheet for Calculations Involving Listing Units |
10.5 | Designs Satisfying Specifications on Total |
10.6 | Designs Satisfying Specifications on Ratio |
10.7 | Amount of Money Billed to Medicare by Day and Week |
10.8 | Designs Satisfying Specifications on Total |
10.9 | Total Admissions with Life-Threatening Conditions and Total Admissions Discharged Dead from Ten Hospitals, 2007 |
10.10 | Summary Data for a Sample of Three Hospitals Selected from the Ten Hospitals in Table 10.9 |
10.11 | Sampling Distribution of , and rclu |
10.12 | Frequency Distribution of Estimated Total Over All Possible Samples of Two Hospitals |
11.1 | Number of Outpatient Surgical Procedures Performed in 1997 in Three Community Hospitals |
11.2 | Number of Women Over 90 Years of Age Admitted to Nursing Homes in a Community During 1997 |
11.3 | Distribution of the Hansen-Hurwitz Estimator Over All Possible Samples of Two Nursing Homes Drawn with Replacement |
11.4 | Number of Women Over 90 Years of Age Admitted to Nursing Homes in a Community During 1997 |
11.5 | Distribution of the Horvitz-Thompson Estimator Over All Possible Samples of Two Nursing Homes Drawn with Replacement |
11.6 | Results of the PPS Sample |
11.7 | Data File for Use in Illustrative Example |
11.8 | Procedure for PPS Sampling with Replacement |
12.1 | Medicaid Payments and Overpayments for 10 Sample Claims Submitted on Patient |
13.1 | Data for the Survey of Physicians |
13.2 | Data that Would be Obtained from the Mail Survey |
13.3 | Actual Values for Missing Data in Table 13.2 |
14.1 | Distribution of |
16.1 | NSCAW Weights for Two Age Domains |
16.2 | NSCAW Response Propensities for Age by Gender Weighting Classes |
16.3 | NSCAW Nonresponse WCA Factors for Age by Gender Weighting Classes |
16.4 | NSCAW Poststratification Adjustment Factors for State Group by Substantiation Poststrata |
16.5 | Final Weight Adjustment Factors for NSCAW Sample |
17.1 | Number of Nursing Homes in Three Regions in a State |
17.2 | Resulting Data from Sample of Nursing Homes |
17.3 | Selection of Sample Subjects from the Departments of Gironde and Dordogne |
17.4 | Association of Wine Consumption vs. Incident Dementia (Model-Based Analysis) |
17.5 | Logistic Regression Analysis of Wine Consumption and Incident Dementia Assuming Simple Random Sampling (Model-Based Analysis) |
17.6 | Logistic Regression Analysis of Wine Consumption and Incident Dementia Incorporating Sample Survey Parameters (Design-Based Analysis) |
A.1 | Random Number Table |
A.2 | Selected Percentiles of Standard Normal Distribution |
Boxes
2.1 | Population Parameters |
2.2 | Sample Statistics |
2.3 | Mean and Variance of Sampling Distribution When Each Sample Has the Same Probability (1/T) of Selection |
2.4 | Mean and Variance of Sampling Distribution When Each Sample Does Not Have the Same Probability of Selection |
3.1 | Estimated Totals, Means, Proportions, and Variances Under Simple Random Sampling, and Estimated Variances and Standard Errors of These Estimates |
3.2 | Population Estimates, and Means and Standard Errors of Population Estimates Under Simple Random Sampling |
3.3 | Coefficients of Variation of Population Estimates Under Simple Random Sampling |
3.4 | Estimated Variances and 100(1−α)% Confidence Intervals Under Simple Random Sampling |
3.5 | Exact and Approximate Sample Sizes Required Under Simple Random Sampling |
4.1 | Estimated Totals, Means, and Variances Under Systematic Sampling, and Estimated Variances, and Standard Errors of These Estimates |
4.2 | Variances of Population Estimates Under Systematic Sampling |
4.3 | Estimation Procedures for Population Means Under Repeated Systematic Sampling |
5.1 | Population and Strata Parameters for Stratified Sampling |
5.2 | Estimates of Population Parameters and Standard Errors of These Estimates for Stratified Sampling |
6.1 | Means and Standard Errors of Population Estimates Under Stratified Random Sampling |
6.2 | Estimated Standard Errors Under Stratified Random Sampling |
6.3 | Estimates of Population Parameters Under Stratified Random Sampling with Proportional Allocation |
7.1 | Formulas for Ratio Estimation Under Simple Random Sampling |
9.1 | Notation Used in Simple One-Stage Cluster Sampling |
9.2 | Estimated Population Characteristics and Estimated Standard Errors for Simple One-Stage Cluster Sampling |
9.3 | Theoretical Standard Errors for Estimates Under Simple One-Stage Cluster Sampling |
9.4 | Exact and Approximate Sample Sizes Required Under Simple One-Stage Cluster Sampling |
10.1 | Notation Used in Simple Two-Stage Cluster Sampling |
10.2 | Estimated Population Characteristics and Estimated Standard Errors for Simple Two-Stage Cluster Sampling |
10.3 | Standard Errors for Population Estimates Under Simple Two-Stage Cluster Sampling |
10.4 | Estimates of Population Characteristics Under Simple Two-Stage Cluster Sampling, Unequal Numbers of Listing Units |
10.5 | Theoretical Standard Errors for Population Estimates for Simple Two-Stage Cluster Sampling, Unequal Numbers of Listing units |
Figures
2.1 | Relative frequency distribution of the sampling distribution of x′ |
2.2 | Relationship among bias, variability, and MSE for data of Table 2.9 |
3.1 | Areas under normal curve within ±1, ±1.96 and ±3 standard errors of the mean |
9.1 | Form for collecting the data from sample households in housing developments |
9.2 | Cost: simple random sample and single-stage cluster sampling |
14.1 | Network sample for health-care providers and skin cancer patients |
16.1 | The correspondence among the target, frame, and respondent populations and the sample |
16.2 | Distribution of NSCAW final weights |
To download the files listed in this book and other material associated with it, use an ftp program or a Web browser.
If you are using an ftp program, type the following at your ftp prompt or URL prompt:
ftp://ftp.wiley.com
Some programs may provide the first Up for you, in which case, you can type:
ftp.wiley.com
If log-in parameters are required, log in as anonymous (e.g., User ID: anonymous). Leave the password blank. After you have connected to the Wiley ftp site, navigate through the directory path of:
/public/sci_tech_med/populations
Also, a direct link to the related FTP site is available on the book’s Wiley.com webpage.
Under the populations directory are subdirectories that include MATLAB files for PC, Macintosh, and UNIX systems and Microsoft® Excel files. Important information is included in the README files.
LIST OF DATA SETS PROVIDED ON WEB SITE
Data Set
momsag.dta
workendta
wloss2.ssd
jacktwin.ssd
jacktwin2.dta
dogscats.ssd
tab7ptl.ssd
tab7ptl.dta
bhratio.dat
tab9_la.dta
tab9_lc.ssd
il10ptl.ssd
il10pt2.dta
il10pt2.ssd
hospslct.ssd
exmp12_2.ssd
exmp12_2.dta
amblnce2.ssd
Preface to the Fourth Edition
This fourth edition of Sampling of Populations: Methods and Applications comes nearly ten years after the 1999 publication of the third edition. Unlike the third edition, it did not involve a major change in organization or emphasis. From our own experiences teaching this topic in graduate one-semester courses and in more intensive short courses held over a two-to-three-day period, as well as from positive feedback we received from students or professionals taking courses using this book or else using it for self-learning or reference, we feel that the organization used in the third edition works well; therefore we have kept it basically intact.
We did feel, however, that the book would benefit greatly from a moderate updating and refreshing along the following lines:
During the years between the third and fourth editions, telephone surveys have undergone very rapid changes owing to such phenomena as declining response rates, the institution of the “Do Not Call List,” the increased penetration of cell phones in households, and the greatly increased number of “cell phone only” households. In addition, list-assisted random digit dialing replaced the Mitofsky–Waksberg method for sampling telephone households. To capture these changes, we invited Drs Michael W. Link and Mansour Fahimi to write a chapter on telephone surveys (Chapter 15). Both Drs. Link and Fahimi are sample survey methodologists who have worked and published intensively in the area of telephone sample surveys. Both are former colleagues of Drs. Levy and Biemer at RTI International. Dr. Link is currently Vice President for Methodological Research and Chief Methodologist at Nielsen, and Dr. Fahimi is presently Vice President for Statistical Research Services at the Marketing Systems Group, which is a major provider of expertise and services for sample surveys through their GENESYS Sampling division.
In summary, we feel that the enhancements mentioned above will provide the reader with a more enjoyable and beneficial experience. We also include material from the Preface to the third edition that we feel might be helpful to readers of this edition.
This fourth edition is likely to be the final edition of Sampling of Populations: Methods and Applications (at least written by us). Therefore, we would like to express our utmost appreciation to our colleagues, students, managers, and staff at all of the institutions with which we were fortunate enough to be associated in the 30+ years since we conceived this book (University of Massachussetts for both Paul and Stan; University of Illinois at Chicago, University of North Carolina at Chapel Hill, and RTI International for Paul; and The Ohio State University for Stan). Stan is particularly appreciative of Ohio State Provost Barbara Snyder’s agreeing to allow him to spend three months away from Columbus on a special research assignment (SRA). This provided him with the time he needed to work on this book. Thanks also to Annick Alpérovitch, Carole Dufouil, and Christophe Tzourio at INSERM Unit 708 in Paris, France for providing an office for Stan and an environment conducive for working on this book during the SRA. We would like to thank our current and former editors: Alex Kugushev at Wadsworth; Beatrice Shube, Helen Ramsey, and Steve Quigley at Wiley. We would also like to thank Nidhi Kochar, Charisse Darrell-Fields, Michael Sabbatino and Tracy McHone for their assistance with various aspects of this book’s preparation. We’re especially appreciative of the expertise Melanie Cole brought to formatting, organizing, and preparing the camera-ready manuscript. Parts of the material in the fourth edition were recently used in Paul’s course at the University of North Carolina and we are grateful to the students in that course for their helpful suggestions and detection of errors. Vince Iannacchione, Paul’s colleague and co-instructor in that course, pointed out several inconsistencies and made numerous helpful comments. Amanda Lewis-Evans designed the cover art. Most of all, we are grateful to our wives, Virginia and Elaine, for putting up with us. It’s been a great ride.
Paul S. Levy
Stanley Lemeshow
Research Triangle Park, North Carolina
Columbus, Ohio
April 2008
Material from the Preface To The Third Edition
The original edition of Sampling of Populations: Methods and Applications was published in 1980 by Lifetime Learning Publications (a Division of Wadsworth, Inc.) under the title Sampling for Health Professionals. Like other Lifetime Learning Books, its primary intended audience was the working professional; in this instance, the practicing statistician. With this as the target audience, the authors felt that such a book on sampling should have the following features:
Sampling for Health Professionals was well received both by reviewers and readers, and had a steady following throughout its existence (1980-91). In addition to having a strong following among practicing statisticians (its intended primary audience), it had been adopted as a primary text or recommended as additional reading by an unexpectedly large number of instructors of sampling courses in various academic units.
Based on the success of this first version of the book, we developed a greatly revised and expanded version that was published by John Wiley & Sons in 1991 under the title Sampling of Populations: Methods and Applications. Our purpose in that revision was to improve the suitability of the book for use as a text in applied sampling courses without compromising its readability or its suitability for the continuation and self-learning markets. The resulting first Wiley edition was a considerably updated and expanded version of the original work, but in much the same style and tone.
Although less than a decade has elapsed since the appearance of the first Wiley edition, there have been major developments both in the design and administration of sample surveys as well as in the analysis of the resulting data from sample surveys. In particular, refinements in telephone sampling and interviewing methodologies have now made it generally more feasible and less expensive than face-to-face household interviewing. While the first edition contained some material on random digit dialing (RDD), it did not cover the various refinements of RDD or the use of list-assisted sampling methods and other innovations that are now widely used.
Also, “user friendly” computer software is much more readily available not only for obtaining standard errors of survey estimates, but also for performing statistical procedures such as contingency table analysis and multiple linear or logistic regression that take into consideration complexities in the sample design. In fact, at the writing of this new edition, modules for the analysis of complex survey data have begun to appear in major general statistical software packages (e.g., Stata). Such software is now widely used and has removed the necessity for many of the complicated formulas that appeared in Chapter 11 as well as elsewhere in the first Wiley edition. Likewise, the analysis methods that appear in Chapter 16 of that edition are a reflection of methods used historically when design-based analysis methodology was not in its present state of development and software was not readily available. We felt that the discussion in that chapter did not reflect adequately the present state of the art with respect to analysis strategies for survey data and we have totally revised it to reflect more closely current practice.
Both of us feel strongly that knowledge of telephone-sampling methodology and familiarity with computer-driven methods now widely used in the analysis of complex survey data should be part of any introduction to sampling methods, and, with this in mind, we have greatly enhanced and revised the material on survey data analysis, and attempted to introduce the use of appropriate software throughout the book in our discussion of the major sample designs and estimation procedures. In addition to adding material on the topics just mentioned, we have made other revisions based on our own experience with the book in sampling courses and on suggestions from students and colleagues. Some of these are listed below.
It is our feeling that one of the strengths of the earlier edition is its focus on the basic principles and methods of sampling. To maintain this focus, we omit or treat very briefly several very interesting topics that have seen considerable development in the last decade. We feel that they are best covered in more specialized texts on sampling. As a result, we do not cover to any extent topics such as distance sampling, adaptive sampling, and superpopulation models that are of considerable importance, but have been treated very well in other volumes. We did, however, include several topics that were not in the previous edition and that we feel are important for a general understanding of sampling methodology. Examples of such topics included in this edition are construction of stratum boundaries and desired number of strata (Chapter 6); estimation of ratios for subdomains (Chapter 7); poststratified ratio estimates (Chapter 7); the Hansen-Hurwitz estimator and the Horvitz-Thompson estimator (Chapter 11).
From our experience with the first edition of Sampling of Populations: Methods and Applications, we feel that this book will be used by practicing statisticians as well as by students taking formal courses in sampling methodology. Both of us teach in schools of public health, and have used this book as the basic text for a one-semester course in sample-survey methodology. Our classes have included a mix of students concentrating in biostatistics, epidemiology, and other areas in the biomedical and social and behavioral sciences. In our experience, this book has been very suitable to this mix of students, and we feel that at least 80% of this material could be covered without difficulty in a single semester course.
Several instructors have indicated that, in their courses on sampling theory. this book works well as a primary text in conjunction with a more theoretical text (e.g., W.G. Cochran, Sampling Techniques, 3rd ed., New York: Wiley, 1977), with the latter text used for purposes of providing additional theoretical background. Conversely, selected readings from our book have been used to provide sampling background to students in broader courses on survey research methodology (often taught in sociology departments).
The number of our students and colleagues who gave us helpful comments and suggestions on our earlier text and on the present volume are too numerous to mention and we are grateful to all of them. We would like to thank, in particular, Janelle Klar and Elizabeth Donohoe-Cook for carefully reading this manuscript and making valuable editorial and substantive suggestions. In addition, we would like to thank the two anonymous individuals who reviewed an earlier draft of this manuscript for the publisher. Although we did not agree with all of their suggestions, we did take into consideration in our subsequent revision many of their thoughtful and insightful comments. Most of all, however, we wish to recognize the pioneers of sampling methodology who have written the early textbooks in this field. In particular, the books by William Cochran, Morris Hansen, William Hurwitz and William Madow, Leslie Kish, and P.V. Sukhatme are statistical classics that are still widely studied by students, academics, and practitioners. Those of us who cut our teeth on these books and have made our careers in survey sampling owe them a great debt.
Paul S. Levy
Stanley Lemeshow
Chicago, Illinois
Amherst, Massachusetts
December 1998
PART 1
Basic Concepts
Information on characteristics of populations is constantly needed by politicians, marketing departments of companies, public officials responsible for planning health and social services, and others. For reasons relating to timeliness and cost, this information is often obtained by use of sample surveys. Such surveys are the subject of this book.
The following discussion provides an example of a sample survey conducted to obtain information about a health characteristic in a particular population. A health department in a large state is interested in determining the proportion of the state’s children of elementary school age who have been immunized against childhood infectious diseases (e.g., polio, diphtheria, tetanus, and pertussis). For administrative reasons, this task must be completed in only one month.
At first glance this task would seem to be most formidable, involving the careful coordination of a large staff attempting to collect information, either from parents or from school immunization records on each and every child of elementary school age residing in that state. Clearly, the budget necessary for such an undertaking would be enormous because of the time, travel expenses, and number of children involved. Even with a sizable staff, it would be difficult to complete such an undertaking in the specified time frame.
To handle problems such as the one outlined above, this text will present a variety of methods for selecting a subset (a sample) from the original set of all measurements (the population) of interest to the researchers. It is the members of the sample who will be interviewed, studied, or measured. For example, in the problem stated above, the net effect of such methods will be that valid and reliable estimates of the proportion of children who have been immunized for these diseases could be obtained in the time frame specified and at a fraction of the cost that would have resulted if attempts were made to obtain the information concerning every child of elementary school age in the state.
More formally, a sample survey may be defined as a study involving a subset (or sample) of individuals selected from a larger population. Variables or characteristics of interest are observed or measured on each of the sampled individuals. These measurements are then aggregated over all individuals in the sample to obtain summary statistics (e.g., means, proportions, and totals) for the sample. It is from these summary statistics that extrapolations can be made concerning the entire population. The validity and reliability of these extrapolations depend on how well the sample was chosen and on how well the measurements were made. These issues constitute the subject matter of this text.
When all individuals in the population are selected for measurement, the study is called a census. The summary statistics obtained from a census are not extrapolations, since every member of the population is measured. The validity of the resulting statistics, however, depends on how well the measurements are made. The main advantages of sample surveys over censuses lie in the reduced costs and greater speed made possible by taking measurements on a subset rather than on an entire population. In addition, studies involving complex issues requiring elaborate measurement procedures are often feasible only if a sample of the population is selected for measurement since limited resources can be allocated to getting detailed measurements if the number of individuals to be measured is not too great.
In the United States, as in many other countries, governmental agencies are mandated to develop and maintain programs whereby sample surveys are used to collect data on the economic, social, and health status of the population, and these data are used for research purposes as well as for policy decisions. For example, the National Center for Health Statistics (NCHS), a center within the United States Department of Health and Human Services, is mandated by law to conduct a program of periodic and ongoing sample surveys designed to obtain information about illness, disability, and the utilization of health care services in the United States [15]. Similar agencies, centers, or bureaus exist within other departments (e.g., the Bureau of Labor Statistics within the Department of Labor, and the National Center for Educational Statistics within the Department of Education) that collect data relevant to the mission of their departments through a program of sample surveys. Field work for these surveys is sometimes done by the U.S. Bureau of the Census, which also has its own program of surveys, or by commercial firms.
The surveys developed by such government agencies often have extremely complex designs and require very large and highly skilled staff (and, hence, large budgets) for their execution. Although the nature of the missions of these government agencies—provision of valid and reliable statistics on a wide variety of indicators for the United States as a whole and various subgroups of it—would justify these large budgets, such costs are rarely justified or at all feasible for most institutions that make use of sample surveys. The information needs of most potential users of sample surveys are far more limited in scope and much more focused around a relatively small set of particular questions. Thus, the types of surveys conducted outside of the federal government are generally simpler in design and “one-shot” rather than ongoing. These are the types of surveys on which we will focus in this text. We will, however, devote some discussion to more complex sample surveys, especially in Chapter 12, which discusses variance estimation methods that have been developed primarily to meet the needs of very complex government surveys.
Sample surveys belong to a larger class of nonexperimental studies generally given the name “observational studies” in the health or social sciences literature. Most sample surveys can be put in the class of observational studies known as “cross-sectional studies.” Other types of observational studies include cohort studies and case-control studies.
Cross-sectional studies are “snapshots” of a population at a single point in time, having as objectives either the estimation of the prevalence or the mean level of some characteristics of the population or the measurement of the relationship between two or more variables measured at the same point in time.
Cohort and case-control studies are used for analytic rather than for descriptive purposes. For example, they are used in epidemiology to test hypotheses about the association between exposure to suspected risk factors and the incidence of specific diseases.
These study designs are widely used to gain insight into relationships. In the business world, for example, a sample of delinquent accounts might be taken (i.e., the “cases”) along with a sample of accounts that are not delinquent (i.e., the “controls”), and the characteristics of each group might be compared for purposes of determining those factors that are associated with delinquency. Numerous examples of these study designs could be given in other fields;
As mentioned above, cohort and case-control studies are designed with the objective in mind of testing some statement (or hypothesis) concerning a set of independent variables (e.g., suspected risk factors) and a dependent variable (e.g., disease incidence). Although such studies are very important, they do not make up the subject matter of this text. The type of study of concern here is often known as a descriptive survey. Its main objective is that of estimating the level of a set of variables in a defined population. For example, in the hypothetical example presented at the beginning of this chapter, the major objective is to estimate, through use of a sample, the proportion of all children of elementary school age who have been vaccinated against childhood diseases. In descriptive surveys, much attention is given to the selection of the sample since extrapolation is made from the sample to the population. Although hypotheses can be tested based on data collected from such descriptive surveys, this is generally a secondary objective in such surveys. Estimation is almost always the primary objective.
In this section, we will discuss the four major components involved in designing sample surveys. These components are sample design, survey measurements, survey operations, and statistical analysis and report generation.
In a sample survey, the major statistical components are referred to as the sample design and include both the sampling plan and the estimation procedures. The sampling plan is the methodology used for selecting the sample from the population. The estimation procedures are the algorithms or formulas used for obtaining estimates of population values from the sample data and for estimating the reliability of these population estimates.
The choice of a particular sample design should be a collaborative effort involving input from the statistician who will design the survey, the persons involved in executing the survey, and those who will use the data from the survey. The data users should specify what variables should be measured, what estimates are required, what levels of reliability and validity are needed for the estimates, and what restrictions are placed on the survey with respect to timeliness and costs. Those individuals involved in executing the survey should furnish input about costs for personnel, time, and materials as well as input about the feasibility of alternative sampling and measurement procedures. Having received such input, the statistician can then propose a sample design that will meet the required specifications of the users at the lowest possible cost.
Just as sampling and estimation are the statistician’s responsibility in the design of a sample survey, the choice of measurements to be taken and the procedures for taking these measurements are the responsibility of those individuals who are experts in the subject matter of the survey and of those individuals having expertise in the measurement sciences. The former (often called “subject matter persons”) give the primary input into specifying the measurements that are needed in order to meet the objectives of the survey. Once these measurements are specified, the measurement experts—often behavioral scientists or professional survey methodologists with special training and skills in interviewing or other aspects of survey research—begin designing questionnaires or forms to be used in eliciting the data from the sample individuals. The design of a questionnaire or other survey instrument that is suitable for collecting valid and reliable data is often a very complex task. It always requires considerable care and often requires a preliminary study, especially if some of the variables to be measured have never been measured before.
Once the survey instruments have been drafted, the statistician provides input with respect to the procedures to be used to evaluate and assure the quality of the data. In addition, the statistician ensures that the data can be easily coded and processed for statistical analysis and provides input into the strategies and statistical methods that will be used in the analysis.
pilot survey