**Introduction of Sample Size**

**Population** in statistics means the collection of a set of all the
elements required for a particular study. **Sample** is a subset of that
entire set chosen very carefully.

It
is a very important base of statistical analysis. Very often we come across
situations in which we may not be able to study the entire population. In that
cases we have to draw well representative samples from the population and infer
about the unknown characteristics of the population solely on the basis of that
sample.

Inference is one of the most important task of statistics and drawing a valid sample is the first and foremost part of inferring.

**Definition of Sample Size**

**Sample size** in Statistics is defined as the number of elements that we
wish to include in a sample.

**Importance of Sample Size**

Selection of size of a sample is a
very crucial work.

One of the pivotal activity of
designing a sample survey is to decide the proper ** sample size.**
Estimation of parameters or testing of hypotheses vastly depends on the size of
sample.

An important aspect:

**Influencing factors of Sample Size**

The choice of ** sample size**
depends on the following factors:

**1. Population size**

**2. Level of significance and power
of the test for which the sample is drawn.**

**3. Standard Deviation of the
Population**

**4. Underlying event rate of the
random experiment or population.**

●
We see that **population size** plays a very important role in selection of
the no. of elements to be drawn in a sample. If we are estimating a
characteristic of **a population of very large size**, the **sample should
consequently contain a fairly large no. of elements** otherwise it will be not
a well representative one.

**Level
of significance and power of any statistical test** is usually chosen following some convention of the
seriousness of committing **the Type I error i.e. α.** So as they are most
of the time predetermined the selection of size of the sample thus depends a
lot on these two quantities.

●
**Standard deviation** is used to measure
the variability within a population. From the value of the population SD we
will be able to understand how scattered the values in the population are. **The
more heteroscedasticity, the more will be the size of the sample, lesser
variability will lead to a small sample.**

●
The **Underlying event rate** is the no of times a particular event is being
observed in a performance of a random experiment. This highly affects the no.
of items to be included in the sample.

**Utility of Sample Size**

It is always and everywhere
suggested to use **large samples.**

There are many **utilities of using
large sample**. They are as follows:

1. **Large samples usually increase
the precision and provide us more reliable results** because the more
elements from the population are included the more representative the sample
will become.

2. **It reduces the amount of bias
in estimation** and also the **sampling error.**

3. When the size of the sample is
sufficiently large**, many useful approximations can be made** like we can
use the **normal approximation to non-normal populations, several laws of large
numbers** can be applied to desired cases etc.

4. We will get **consistent
estimators** if we take large sample and it will yield efficient results
regarding inference.

5. For constructing **confidence
intervals with a fixed confidence
coefficient** sample size is a very useful factor. The larger the sample the
more reliable the confidence interval.

Also **the degrees of freedom of
different statistical tests** are calculated by subtracting 1 from the sample
size. This can be considered as a utility.

**Determination of Sample size**

Most of the above stated influencing
factors play important role in **determination of sample size.**

They are **power of the test,
variability pattern of the population, population size **etc. Even before
starting a survey we may look into the previous surveys and also get an idea in
determining the size of the sample.

Usually the **distribution required
for sample size determination **is the **distribution of the underlying
population **from which the sample is to be drawn or has been drawn.

**For example,**

Let us consider hypothesis test
regarding mean of a univariate normal population.

Let **{X _{1},X_{2},.......X_{,n}
} **be a random sample taken from a

__N__ormal Population with

**unknown mean μ and known variance σ**

^{2}.**
**

The __null hypothesis__: **H _{0}:
μ = 0 against a simple alternative H_{1}: μ = μ***, where μ

^{*}>0.

Now, if we wish to

(1) reject H_{0} with a
probability of at least 1-β when H_{1} is true (i.e. a__ __power__ __of the test), and

(2) reject H_{0} with
probability α (i.e. Type I error ) when H_{0} is true, then we need the
following:

If **z _{α}** is the

**upper α percentage point of the standard normal distribution**, then

P[x_bar >z_{α}σ√∎|H_{0} is true**] = α**

Now, through careful manipulation,
this can be shown that

n ≥ (( z_{α}+Φ^{-1}(1-β))/ ( μ^{*}/σ))^{2}

where Φ(.)is the __C__umulative Distribution
Function__ __of the Standard Normal Distribution.

**Formula of Sample Size**

The conventional formula for **calculating
sample size** is:

Sample Size = (Distribution of 50%)
/ ((Margin of Error% / Confidence Level Score)^{2})

● **Finite Population Correction: **

True Sample = (Sample Size X Population) /
(Sample Size + Population – 1)

**Example of Sample size**

The following video links contain
some examples of Sample size determination:

**Conclusion**

Further we can say that **no sample
is perfect** and **the maximum permissible limit of error **should be
determined by the experimenter himself.

**sample size**should always be decided before the start of the survey and this is not desired to be changed while running the survey.