In any Six Sigma project, the team will have to take a sample of the population that they are studying. Lean Six Sigma courses cover a variety of sampling methods for this purpose. Sampling in statistics is usually applied in the Six Sigma Measure phase of the Six Sigma DMAIC cycle. Six Sigma Green Belt training will differentiate between the two main types of sampling methods: probability sampling methods and non-probability sampling methods. Let’s have a closer look at these two types of sampling methods as well as sub-types of sampling methods.
Probability Sampling Methods
Probability sampling methods are ones where the selection of units from the population is made according to known probabilities. For example, a simple random sample, probability proportional to sample size etc. In probability sampling methods, it is possible to both determine which sampling units belong to which sample and the probability that each sample will be selected. In other words, the sample has a known probability of being selected in this method.
The following sampling methods are examples of probability sampling methods:
- Simple Random Sampling (SRS)
- Stratified Sampling
- Cluster Sampling
- Systematic Sampling
Probability Sampling Methods: Simple Random Sampling (SRS)
When looking at probability sampling methods, simple random sampling is a special case of a random sample. A sample is a simple random sample if each unit of the population has an equal chance of being selected for the sample. It is the basic sampling technique where we select a group of subjects or a sample for study from a larger group or population. Each individual is chosen entirely by chance and each member of the population has an equal chance of being included in the sample. Note that every possible sample of a given size has the same chance of selection. It is also known as ‘unrestricted random sampling’.
There are 2 sampling methods types of simple random sampling methods. The first one is SRS without replacement and the second one is SRS with replacement. When a population element can be selected more than one time, it is known as sampling with replacement. When a population element can be selected only one time, it is known as sampling without replacement.
Probability Sampling Methods: Stratified Random Sampling
Of all the sampling methods, the procedure commonly used in surveys is ‘stratified sampling’. This technique is mainly used to reduce the population heterogeneity or diversity and to increase the efficiency of the estimates.
Stratification means division into groups. In this method, the population is divided into a number of subgroups or strata. The strata should be formed in such a way that each stratum is homogeneous or similar as far as possible. Then, from each stratum, a simple random sample may be selected and these are combined together to form the required sample from the population. Please note an individual group is called a ‘stratum’.
With stratified sampling one should:
- Partition the population into groups; also known as ‘strata’
- Obtain a simple random sample from each group; also known as ‘stratum’
- Collect data on each sampling unit that was randomly sampled from each group, for instance, stratum
There are 2 types of stratified sampling methods: proportional and non-proportional. In the proportional sampling, equal and proportionate representation is given to subgroups or strata. If the number of items is large, then the sample will have a higher size and vice versa. The population size is denoted by ‘N’ and the sample size is denoted by ‘n’. The sample size is allocated to each stratum in such a way that the sample fraction is constant for each stratum. That is given by n/N = c. Thus, in this method, each stratum is represented according to its size.
In the non-proportionate sample, equal representation is given to all the sub-strata regardless of their existence in the population.
When choosing among sampling methods, some reasons for using stratified sampling over simple random sampling are:
- The cost per observation in the survey may be reduced
- Increased accuracy at given cost
Example of Stratified Random Sampling
Please have a look at the example of stratified sampling in the figure below. Please have a look at the ‘particulars’ column. The first row refers to the scope of the population. The population covers all primary students in local school district. The strata cover 20 different primary schools. According to simple random sampling criteria, 50 students were selected from each of the 20 primary schools. Finally, the sample size chosen includes 1000 students from 20 primary schools.
Probability Sampling Methods: Cluster Sampling
It is sometimes expensive to spread your sample across the population as a whole. When looking at sampling methods with cost in mind, we may choose cluster sampling methods. Cluster sampling divides the population into groups or clusters. A number of clusters are selected randomly to represent the population. Then, all units within the selected clusters are included in the sample. The selected clusters are then used to represent the population.
No units from non-selected clusters are included in the sample. They are represented by those from selected clusters.
This differs from stratified sampling, where some units are selected from each group. Examples of clusters may be
factories, schools and geographic areas such as electoral subdivisions.
Cluster sampling and stratified sampling are two very different sampling methods. With cluster sampling, one can
- Divide the population into groups (clusters)
- Obtain a simple random sample of so many clusters from all possible clusters
- Obtain data on every sampling unit in each of the randomly selected clusters
Example of Cluster Sampling
It’s easier to explain the differences between sampling methods using the same example. Let us use the same example that we used to explain stratified sampling. The population covers all primary students in local school district. In the next row, the strata cover 20 different primary schools. Six primary schools were selected from the 20 primary schools using simple random sampling. Finally, the sample size chosen includes each student in the 6 primary schools which were selected.
Probability Sampling Methods: Systematic Sampling
In systematic sampling, the whole sample selection is based on just a random start. The first unit is selected with the
help of random numbers and the rest get selected automatically according to some pre-designed pattern. With systematic random sampling, every K’th element in the frame is selected for the sample, with the starting point among the first K elements determined at random. This method is widely employed because of its ease and convenience. A frequently used method of sampling when a complete list of the population is available is systematic sampling. It is also called quasi-random sampling or interval sampling.
Please note this method is often used in industry; where an item is selected for testing from a production line e.g. every fifteen minutes to ensure that machines and equipment are working according to specifications. This technique could also be used when questioning people in a sample survey. A market researcher might select every 15 th person who enters a particular store, after selecting a person at random as a starting point; or interview occupants of every 7th house in a street, after selecting a house at random as a starting point. In such a case, there is also a possibility that researcher wants to select a fixed size sample. In this case, it is first necessary to get acquainted with the whole population size from which the sample is being selected.
Example of Systematic Sampling
For example, if we want to select a sample of 50 students from 500 students under this method, Kth item is picked up from the sampling frame and ‘K’ is called the sampling interval. As a result, sampling interval ‘K’ would be equivalent to N divided by n. ‘N’ refers to population size and ‘n’ refers to sample size. Thus, K is equal to 10; which is the result of 500 divided by 50. K equal to 10 is the sampling interval.
Systematic sampling consists of selecting a random number say ‘i’, K and every Kth unit subsequently. Suppose, the
random number ‘i’ is 5, then we select 5, 15, 25, 35, 45, etc. The random number ‘i’ is called the ‘random start’. The technique will generate K systematic samples with equal probability. Of all the sampling methods, systematic sampling is preferably used when the information is to be collected from trees in a forest, houses in blocks, entries in a register which are in a serial order etc.
Non-probability Sampling Methods
The second of the main sampling methods is non-probability sampling methods. It is where discretion is used to select ‘representative’ units from the population (or) to infer that a sample is ‘representative’ of the population. This method is called judgment or purposive sampling. This method is mainly used for opinion surveys.
A common type of judgment sample used in surveys is the quota sample. This method is not used in general because of prejudice and bias of the enumerator. However; if the enumerator is experienced and an expert, then this method may yield valuable results. For example, in the market research survey of the performance of a new car, the sample was made up of all new car purchasers.
Please note as far as possible, non-probability sampling methods should be avoided. Such sampling methods are based on human choice rather than random selection. The statistical theory cannot explain how they might behave and potential sources of bias are rampant or uncontrollable with such sampling methods. Always choose probability sampling methods where possible.