1. What is Central Limit Theorem?
Central Limit Theorem is the backbone of probability theory. The theorem shows that for a given fairly large ‘n’ (Sample size) from a population with finite variance, the mean samples taken from the population are same as a population mean.
What does CLT stand for?
CLT stands for Central Limit Theorem
What makes the Central Limit Theorem
so important?
This
theorem shows that despite of the underlying distribution when ‘n’ is large with finite
mean and variance, the arithmetic mean of the distribution under given
conditions follows approximately normal distribution. CLT tells us about the
shape of the distribution after repeated trials.
Condition:
CLT requires linear, additive behavior of variables involved.
2. Definition of Central Limit Theorem
Central Limit Theorem in the
mathematical theory of probability is expressed as:
Let
X_{i} (i=1, 2,..n) be independent random variables and E (Xi) =µi and
variance (Xi) = σ_{i}^{2} , then under certain general
conditions, the random variable S_{n}= X_{1} + X_{2}
+ X_{3} + X_{4} + X_{5} + X_{6}.. X_{n}
tends to be asymptotically normal with mean µ and standard deviation σ
where µ = Σµ_{i} and σ^{2} = Σσ_{i}^{2}. 
The above definition is also an
answer to the well known and mostly asked question "What is the
Central Limit Theorem in Statistics?"
Laplace was the first to state this theory in 1812 and a proof was
given by Liapounoff under general conditions in 1901. Commonly speaking,
a Central Limit Theorem is a set of weakconvergence theorems used
in probability theory.
3. Some particular cases of C.L.T.
Here, we consider some general cases
of this Central Limit Theorem:
De Moivre Laplace Theorem
If X_{i}= {1 with
probability p 0
with probability q Then, the distribution of the
random variable S_{n} = X_{1} + X_{2} + X_{3}
+ X_{4} ……...X_{n} where X_{i}’s are independent, is asymptotically
normal as n → ∞ 
Proof: M.G.F is
M_{xi }(t) = E (e^{tXi})
= et.1p + e^{t.0}q = (q + pet) (*)
M.G.F of sum, Sn= X_{1} + X_{2}
+ X_{3} + X_{4} ……...X_{n} is given by
MSn (t) = M_{Xi}+X_{i}+X_{3}
….X_{n }(t) = M_{X1 }(t) M_{X2 }(t) M_{X3}(t)...M_{Xn
}(t) = [M_{Xi}(t)]n
(since X_{i}’s are)
= [q + pet]^{n}
Which is M.G.F of binomial variate
with parameters ‘n’ and ‘p’.
Hence, by uniqueness theorem of M.G.F’s,
Sn ~ B (n, p)
∴ E (S_{n}) = np = μ (say) and V (Sn) = npq= σ_{2}
Mz(t) = e^{µt/σ} MSn(t/σ) =
e^{npt/√npq}[q + pe^{t/√npq}]^{n}
=
[1 + t^{2}/ 2n + o( n^{3/2})]^{n}
Where o (n^{3/2})
represents terms involving n^{3/2} and more high powers of n in
the denominator. Proceeding to the limits as limit and n ➡∞
=> lim_{n→∞} M_{z}(t)
= lim_{n→∞} [1 + t^{2}/n + 0(n^{3/2})]^{n}
=> lim_{n→∞}[1+ t^{2}/2n]^{n
}= e^{t2/2}
Remarks: This theorem depicts that standard binomial variate tends
to standard normal variate as n → ∞ or in simple words Binomial Distribution
as n → ∞ Tends to Normal Distribution.
4. Lyapunov’s CLT
This
theorem, named after Russian mathematician Aleksandr Lyapunov. States that:
Let
X_{1}, X_{2}, X_{3} be independent r.v’s (not
necessarily identical) with finite expected value μ_{i} and
variance σ_{i}^{2 }and random variables X_{i} have
moments of some order (2 + δ), the rate of growth of these moments is limited
by the Lyapunov condition: S_{n}^{2}
= Σ σ_{i}^{2} i =1.2,... n If
for some δ >0 Lyapunov’s condition: lim _{n→∞}1/ S_{n}^{2+δ}
Σ E X_{i} –μ_{i} ^{2+δ} = 0 is satisfied then the
sum (X_{i}
–μ_{i})/ S_{n} converges to standard normal variate
distribution as: n → ∞ 
5. Lindeberg Levy Theorem
Lindeberg and Levy proved the
following case of Central Limit Theorem 
“If X_{1},X_{2},X_{3}…….X_{n}
are independently and identically distributed random variables with E(X_{i})
= μ_{i} V(X_{i})
= σ_{1}^{2} Then S_{n} = X_{1}
+ X_{2} + X_{3} + X_{4} ……...X_{n} is
asymptotically normal with mean μ = nμ_{1} and σ^{2}= nσ_{1}^{2µ} 
Here the assumptions made are 
 The variables are independently and identically
distributed
 E (Xi^{2}) exists for all i=1,2,3,4...n.
Proof: Let M_{1 }(t) denote the M.G.F of each of the
deviation (X_{i} – μ_{1}) and M (t) denote the M.G.F of
standard normal variate z= (S_{n} μ)/σ
Since, μ_{1}’ and μ_{2}’
(about origin) of the deviation (X_{i} – μ_{1}) are given by,
μ_{1}’ = E (X_{i} μ_{2})
= 0
Μ_{2}’ = E (X_{i} μ_{2})
= σ_{1}^{2}
We have,
M_{1}(t)= ( 1 +μ_{1}’
+ μ_{2}’ t^{2}/2 + μ_{3}‘ t^{3}/3! +.....
) = {1 + σ_{1}^{2}t^{2}/2 +O (t^{3})}
Where O^{3 }contains terms
with t^{3} and higher powers of t.
And since, X_{i}’s are independent,
we get
M_{Z }(t) = M_{Σ (Xi  µi)
σ}^{ 2}(t) = M_{Σ (Xi  µi)} ^{2}(t/σ) = ∏M _{(xi
 µi)} ^{2}(t/σ) = [M_{1} (t/σ)]^{n} = [M_{1}
(t/√nσ_{1})]
= [1 + t^{2}/2n + o (n^{3/2})]^{n}
Where o(n^{3/2}) represents
terms involving n^{3/2 } and higher powers of n in the denominator
proceeding to the limits as limit and n tends to infinity,
=> lim_{n→∞ }M_{Z}
(t) = lim_{n→∞ }[1+ t^{2}/n + o (n^{3/2})]^{n }
=> lim_{n→∞} [1+ t^{2}/2n]^{n}=
e^{t2/2}
Which
is M.G.F of standard normal variate. Hence by uniqueness of M.G.F’s S_{n}
= X_{1} + X_{2} + X_{3} + X_{4} ……...X_{n}
is asymptotically normal with mean μ = nμ_{1} and σ^{2}= nσ_{1}^{2}
If
a sequence satisfies Lyapunov’s condition it also satisfies Lindeberg’s
condition. However, the converse is not true.
6. Liapounoff’s Central Limit Theorem
This
is the Central Limit Theorem for generalized case when the variables are
identically distributed and some further conditions are imposed.
Let X_{1}, X_{2},
X_{3}… X_{n} be independent random variables such that E(X_{i})= μ_{i} V(X_{i})= σ_{i}^{2} 
Suppose that the third absolute
moment say p_{i}^{3} of Xi about its mean exists i.e.
p_{i}^{3} = E{ X_{i}
μ_{i}^{3}} is finite. i = 1,2,3...n
Further, let p^{3} = ∑p_{i}^{3}
If lim _{n→∞} p/σ = 0, the
sum X =X_{1} + X_{2} + X_{3} + X_{4}
……...X_{n} is asymptotically N (μ,σ2)
Where μ=∑μi and σ^{2}
= ∑σi^{2}
Remarks: About Liapounoff’s theorem
If the variables X_{1}, X_{2},
X_{3}… X_{n} are identical, then p^{3} = ∑pi^{3}
= n_{1}^{3}
And σ^{2} = ∑σi^{2}
= nσ_{1}^{2}
Hence,
for identical variables, the conditions of Liapounoff’s theorem are satisfied.
It
may be pointed out that Lindebergh Levy theorem is not a particular case of Liapounoff’s
theorem since the former does not assume the existence of the third moment.
7. Different forms of C.L.T
C.L.T can be stated in other forms
too which are as follows:
Remarks:
 It does not matter if we change nonstrict inequalities to strict inequalities P [a< (.)
 CLT gives a good approximation in binomial case when p=½.
For p near about 0 or 1, CLT approximation still holds good, but for that ‘n’
has to be sufficiently large.
8. Applications of Central Limit Theorem
(a) If X_{1}, X_{2}, X_{3}.. are i.i.d
B(r, p) variates and S_{n} = X_{1} + X_{2} + X_{3}
+ X_{4} ……...X_{n}
Then, E (S_{n}) = E (X_{i})
= nrp
And V (S_{n}) = V (X_{i})
= nrpq
Where i=1,2..n
(b) Let X_{1}, X_{2}… be i.i.d Bernoulli
variates i.e . B(1,p), then
S_{n} = X_{1} + X_{2}
+ X_{3} + X_{4} ……...X_{n} = B (n, p). Hence, we get in
(*)
(c) If Yn is distributed as Pn then,
Thus, for instance
lim_{n→∞ }P (Yn ≤ n) = ½
i.e. Σe^{n}n^{k}/k!
= ½ as n ➡
∞
Proof: Let X_{1}, X_{2}… be i.i.d P (1) variates.
Then, S_{n} =X_{1} + X_{2} + X_{3} + X_{4}
……...X_{n}~ P(n)
=> Y_{n} = S_{n}
In particular, take a =  ∞
b=0
We get,
P (Yn ≤ n) ➡½ as n ➡∞
Relationship between Central Limit Theorem and
Weak Law of Large Numbers
1. Both the Central
Limit Theorem and the Weak Law of
Large Numbers (WLLN) hold good for the sequence of i.i.d random variables
with finite mean μ and variance σ^{2}. However in this case the CLT
gives a stronger result then the WLLN in the sense that the former provides an
estimate of P [(Sn nμ)/n ] as given below:
where Φ (.) is the distribution
function of standard normal variate. However, WLLN does not require the
existence of variance.
2.
For the sequence {X_{n}} of independent and uniformly bounded r.v.’s, WLLN holds good and CLT holds in this case
provided B_{n}= var( X_{1} + X_{2} + X_{3} + X_{4}……...X_{n})
= σ_{1}^{2} + σ_{2}^{2}……….σ_{n}^{2 }➡ ∞
as n ➡∞.
3.
For the sequence {Xn} of independent r.v.’s, CLT may hold good, but the WLLN
may not hold well.
9. Uses of CLT
 CLT explains that a lot of
commonly used estimators follow an "Approximately Normal"
distribution which means tables of values (builtin functions in statistical
software or programming languages like ‘R’, ‘C programming’) can be used to construct or
build confidence intervals and approximate pvalues. It is very practical.
 CLT’s ability can be applied to
an kind of distributions. It allows statisticians to develop standardized
methods to derive much useful information from almost any sample by
obtaining CLT based statistics and hypothesis tests.
 Under certain conditions, CLT
also holds good for variables which are not independent.
10. Examples of Central Limit Theorem
1. Suppose a school has 1200 students, with 200 each in grades
from 712th standard. Here, each student has an effect on marks and each
student’s marks are independent of each other.
If
we take a sample of 25 students each for their marks and take total 10 samples
and find the mean grade. We observe that first sample has 9.52, then, we find
other sample’s mean is 932 hence, the table showing grades of different
students are shown below:
Sample
(n = 25) 
Average
Grade 
1 
9.52 
2 
9.32 
3 
9.08 
4 
8.80 
5 
9.48 
6 
9.36 
7 
9.48 
8 
10.12 
9 
9.64 
10 
9.35 


When
samples are taken and means are calculated, each time the mean, starts to form
their own distribution. This distribution is the sampling distribution
because it represents the distribution of estimates from population on repeated
samples. In such a case, a histogram of sample means, of say, 1,000
samples would appear like the following.
The
shape of the distribution of 25 samples looks a bit like Gaussian
distribution (Normal Distribution), regardless of the original
distribution being uniform and the shape of the sample means taken from the population
tend more towards normal distribution as we keep increasing ‘n’.
The
central limit theorem states that, it is easy to show that the mean of this
sampling distribution will be the population mean, and that the variance is
equal to the population variance divided by n, taking square root of the
variance, given standard deviation of population, which is known as standard
error. To conclude, this example depicts that the mean of the sample means will
be equal to the population means, and the variance will get smaller with
i)
Decrease in population variance or
ii) With the increment in sample size.
2.
Let X_{1}, X_{2}.. be i.i.d Poisson variates with
parameter λ. Use central limit theorem to estimate that:
P (120 ≤ S_{n} ≤ 160) where
S_{n} = X_{1}+ X_{2}+ X_{3}… X_{n} = λ 2
and n=75.
Solution:
Since, X_{i}’s are i.i.d P (λ),
E (Xi) = λ
Var (S_{n})= var ( X_{1}+X_{2}+X_{3}…
X_{n}) = n λ
Hence, by Lindeberg CLT (for large
n)
S_{n} ~ N (n λ, n λ) = N (µ
= 150, σ2 = 150); n=75 λ = 2
3.
The probability distribution for total distance covered in a walk (biased or
unbiased) will tend towards a normal distribution.
4.
Flipping coins for a large n will result in a normal distribution for the total
of heads (or equivalently total number of tails).
Try yourself:
 A distribution with unknown
mean μ has variance equal to 1.5. Use central limit theorem to find how
large a sample should be taken from the distribution in order that the
probability will be at least 0.95 that the sample mean will be within 0.5
of the population mean?
 The lifetime of a certain brand
of an electric tube light is considered a random variable with mean 1,200
hours and standard deviation 250 hours. Find the probability using central
limit theorem showing that the average lifetime of 60 bulbs exceeds 1400
hours.
Sir Francis Galton was an English Victorian statistician, progressive,
polymath, sociologist, psychologist, anthropologist, eugenicist, tropical
explorer, geographer, inventor, meteorologist, protogeneticist, and psychometrician,
knighted in 1909. He produced over 340 papers and books and created the
statistical concept of correlation and widely promoted regression towards the
mean. He was the first man to apply statistical methods to the study of human
differences and inheritance of intelligence, and introduced the use of
questionnaires and surveys for collecting data on human communities, which he
needed for genealogical and biographical works and for his anthropometric
studies. He quoted as:
"I
know of scarcely anything so apt to impress the imagination as the wonderful
form of cosmic order expressed by the “Law of Frequency of Error”. The
law would have been personified by the Greeks and defied, if they had known of
it. It reigns with serenity and in complete selfeffacement, amidst the wildest
confusion. The huger the mob, and the greater the apparent anarchy, the more
perfect is its sway. It is the supreme Law
of Unreason. Whenever a large sample of chaotic elements are taken in hand
and marshaled in the order of their magnitude, an unsuspected and most
beautiful form of regularity proves to have been latent all along".