Introduction of Sample Size

Population in statistics means the collection of a set of all the elements required for a particular study. Sample is a subset of that entire set chosen very carefully.

It is a very important base of statistical analysis. Very often we come across situations in which we may not be able to study the entire population. In that cases we have to draw well representative samples from the population and infer about the unknown characteristics of the population solely on the basis of that sample.

Inference is one of the most important task of statistics and drawing a valid sample is the first and foremost part of inferring.

sample size


Definition of Sample Size

Sample size in Statistics is defined as the number of elements that we wish to include in a sample.

sample size 1.png


Importance of Sample Size

Selection of size of a sample is a very crucial work.

One of the pivotal activity of designing a sample survey is to decide the proper sample size. Estimation of parameters or testing of hypotheses vastly depends on the size of sample.

An important aspect:

sample size 2.png

Influencing factors of Sample Size

The choice of sample size depends on the following factors:

1. Population size

2. Level of significance and power of the test for which the sample is drawn.

3. Standard Deviation of the Population

4. Underlying event rate of the random experiment or population.

● We see that population size plays a very important role in selection of the no. of elements to be drawn in a sample. If we are estimating a characteristic of a population of very large size, the sample should consequently contain a fairly large no. of elements otherwise it will be not a well representative one.

Level of significance and power of any statistical test is usually chosen following some convention of the seriousness of committing the Type I error i.e. α. So as they are most of the time predetermined the selection of size of the sample thus depends a lot on these two quantities.

sample size 3.png

Standard deviation is used to measure the variability within a population. From the value of the population SD we will be able to understand how scattered the values in the population are. The more heteroscedasticity, the more will be the size of the sample, lesser variability will lead to a small sample. 

● The Underlying event rate is the no of times a particular event is being observed in a performance of a random experiment. This highly affects the no. of items to be included in the sample.

 

 

Utility of Sample Size

It is always and everywhere suggested to use large samples.

There are many utilities of using large sample. They are as follows:

1. Large samples usually increase the precision and provide us more reliable results because the more elements from the population are included the more representative the sample will become.

2. It reduces the amount of bias in estimation and also the sampling error.

3. When the size of the sample is sufficiently large, many useful approximations can be made like we can use the normal approximation to non-normal populations, several laws of large numbers can be applied to desired cases etc.

4. We will get consistent estimators if we take large sample and it will yield efficient results regarding inference.

5. For constructing confidence intervals with a fixed confidence coefficient sample size is a very useful factor. The larger the sample the more reliable the confidence interval.

Also the degrees of freedom of different statistical tests are calculated by subtracting 1 from the sample size. This can be considered as a utility.

Determination of Sample size

 

Most of the above stated influencing factors play important role in determination of sample size.

They are power of the test, variability pattern of the population, population size etc. Even before starting a survey we may look into the previous surveys and also get an idea in determining the size of the sample.

Usually the distribution required for sample size determination is the distribution of the underlying population from which the sample is to be drawn or has been drawn.

For example,

Let us consider hypothesis test regarding mean of a univariate normal population.

Let {X1,X2,.......X,n } be a random sample taken from a Normal Population with unknown mean μ and known variance σ2.

The  null hypothesis: H0: μ = 0 against a simple alternative H1: μ = μ*, where μ* >0.

Now, if we wish to

(1) reject H0 with a probability of at least 1-β when H1 is true (i.e. a power of the test), and

(2) reject H0 with probability α (i.e. Type I error ) when H0 is true, then we need the following:

If zα is the upper α percentage point of the standard normal distribution, then

P[x_bar >zασ√|H0 is true] = α

Now, through careful manipulation, this can be shown that

                             n ≥ (( zα-1(1-β))/  ( μ*/σ))2 

where Φ(.)is the Cumulative Distribution Function of the Standard Normal Distribution.

Formula of Sample Size

The conventional formula for calculating sample size is:

Sample Size Calculation: 

Sample Size = (Distribution of 50%) / ((Margin of Error% / Confidence Level Score)2)

Finite Population Correction:

       True Sample = (Sample Size X Population) / (Sample Size + Population – 1)

Example of Sample size

The following video links contain some examples of Sample size determination:

Conclusion

Further we can say that no sample is perfect and the maximum permissible limit of error should be determined by the experimenter himself.

The sample size should always be decided before the start of the survey and this is not desired to be changed while running the survey.