In statistics, survey sampling describes the process of selecting a sample of elements from a target population to conduct a survey. The term "survey" may refer to many different types or techniques of observation. In survey sampling it most often involves a questionnaire used to measure the characteristics and/or attitudes of people. Different ways of contacting members of a sample once they have been selected is the subject of survey data collection. The purpose of sampling is to reduce the cost and/or the amount of work that it would take to survey the entire target population. A survey that measures the entire target population is called a census.
Survey samples can be broadly divided into two types: probability samples and non-probability samples. Probability-based samples implement a sampling plan with specified probabilities (perhaps adapted probabilities specified by an adaptive procedure). Probability-based sampling allows design-based inference about the target population. The inferences are based on a known objective probability distribution that was specified in the study protocol. Inferences from probability-based surveys may still suffer from many types of bias.
Surveys that are not based on probability sampling have greater difficulty measuring their bias or sampling error. Surveys based on non-probability samples often fail to represent the people in the target population.
In academic and government survey research, probability sampling is a standard procedure. In the United States, the Office of Management and Budget's "List of Standards for Statistical Surveys" states that federally funded surveys must be performed:
selecting samples using generally accepted statistical methods (e.g., probabilistic methods that can provide estimates of sampling error). Any use of nonprobability sampling methods (e.g., cut-off or model-based samples) must be justified statistically and be able to measure estimation error.
For example, many surveys have substantial amounts of nonresponse. Even though the units are initially chosen with known probabilities, the nonresponse mechanisms are unknown. For surveys with substantial nonresponse, statisticians have proposed statistical models, with which data sets are analyzed.
Issues related to survey sampling are discussed in several sources including Salant and Dillman (1994).
In a probability sample (also called "scientific" or "random" sample) each member of the target population has a known and non-zero probability of inclusion in the sample. A survey based on a probability sample can in theory produce statistical measurements of the target population that are:
A probability-based survey sample is created by constructing a list of the target population, called the sampling frame, a randomized process for selecting units from the sample frame, called a selection procedure, and a method of contacting selected units to and enabling them complete the survey, called a data collection method or mode. For some target populations this process may be easy, for example, sampling the employees of a company by using payroll list. However, in large, disorganized populations simply constructing a suitable sample frame is often a complex and expensive task.
Common methods of conducting a probability sample of the household population in the United States are Area Probability Sampling, Random Digit Dial telephone sampling, and more recently, Address-Based Sampling.
Within probability sampling, there are specialized techniques such as stratified sampling and cluster sampling that improve the precision or efficiency of the sampling process without altering the fundamental principles of probability sampling.
Stratification is the process of dividing members of the population into homogeneous subgroups before sampling. The strata should be mutually exclusive: every element in the population must be assigned to only one stratum. The strata should also be collectively exhaustive: no population element can be excluded. Then methods such as simple random sampling or systematic sampling can be applied within each stratum. This often improves the representativeness of the sample by reducing sampling error.
Bias in surveys is undesirable, but often unavoidable. The major types of bias that may occur in the sampling process are:
Many surveys are not based on probability samples, but rather on finding a suitable collection of respondents to complete the survey. Some common examples of non-probability sampling are:
In non-probability samples the relationship between the target population and the survey sample is immeasurable and potential bias is unknowable. Sophisticated users of non-probability survey samples tend to view the survey as an experimental condition, rather than a tool for population measurement, and examine the results for internally consistent relationships.
The textbook by Groves et alia provides an overview of survey methodology, including recent literature on questionnaire development (informed by cognitive psychology) :
The other books focus on the statistical theory of survey sampling and require some knowledge of basic statistics, as discussed in the following textbooks:
The elementary book by Scheaffer et alia uses quadratic equations from high-school algebra:
More mathematical statistics is required for Lohr, for Särndal et alia, and for Cochran (classic):
The historically important books by Deming and Kish remain valuable for insights for social scientists (particularly about the U.S. census and the Institute for Social Research at the University of Michigan):