1. What is Central Limit Theorem?

Central Limit Theorem is the backbone of probability theory. The theorem shows that for a given fairly large ‘n’ (Sample size) from a population with finite variance, the mean samples taken from the population are same as a population mean.

Question:

What does CLT stand for?

CLT stands for Central Limit Theorem

Question:

What makes the Central Limit Theorem so important?

This theorem shows that despite of the underlying distribution when ‘n’ is large with finite mean and variance, the arithmetic mean of the distribution under given conditions follows approximately normal distribution. CLT tells us about the shape of the distribution after repeated trials.

Condition: CLT requires linear, additive behavior of variables involved.

2. Definition of Central Limit Theorem

Central Limit Theorem in the mathematical theory of probability is expressed as:-

 Let Xi (i=1, 2,..n) be independent random variables and E (Xi) =µi and variance (Xi) = σi2 , then under certain general conditions, the random variable Sn= X1 + X2 + X3 + X4 + X5 + X6.. Xn tends to be asymptotically normal with mean µ and standard deviation σ where µ = Σµi and σ2 = Σσi2.

The above definition is also an answer to the well- known and mostly asked question "What is the Central Limit Theorem in Statistics?"

Laplace was the first to state this theory in 1812 and a proof was given by Liapounoff under general conditions in 1901. Commonly speaking, a Central Limit Theorem is a set of weak-convergence theorems used in probability theory.

3. Some particular cases of C.L.T.

Here, we consider some general cases of this Central Limit Theorem:

De- Moivre Laplace Theorem

 If Xi= {1 with probability p 0 with probability q Then, the distribution of the random variable Sn = X1 + X2 + X3 + X4 ……...Xn where Xi’s are independent, is asymptotically normal as n → ∞

Proof: M.G.F is

Mxi (t) = E (etXi) = et.1p + et.0q = (q + pet) (*)

M.G.F of sum, Sn= X1 + X2 + X3 + X4 ……...Xn is given by

MSn (t) = MXi+Xi+X3 ….Xn (t) = MX1 (t) MX2 (t) MX3(t)...MXn (t) = [MXi(t)]n

(since Xi’s are)

= [q + pet]n

Which is M.G.F of binomial variate with parameters ‘n’ and ‘p’.

Hence, by uniqueness theorem of M.G.F’s, Sn ~ B (n, p)

E (Sn) = np = μ (say) and V (Sn) = npq= σ2

Let, Z =

Mz(t) = e-µt/σ MSn(t/σ) = e-npt/√npq[q + pet/√npq]n

= [1 + t2/ 2n + o( n-3/2)]n

Where o (n-3/2) represents terms involving n3/2 and more high powers of n in the denominator. Proceeding to the limits as limit and n

=> limn→∞ Mz(t) = limn→∞ [1 + t2/n + 0(n-3/2)]n

=> limn→∞[1+ t2/2n]n = et2/2

Which is M.G.F of standard normal variate. Hence by uniqueness of M.G.F’s Z = is asymptotically N(µ,σ2) and n → ∞

Remarks: This theorem depicts that standard binomial variate tends to standard normal variate as n → ∞ or in simple words Binomial Distribution as n → ∞ Tends to Normal Distribution.

4. Lyapunov’s CLT

This theorem, named after Russian mathematician Aleksandr Lyapunov. States that:

 Let X1, X2, X3 be independent r.v’s (not necessarily identical)  with finite expected value μi and variance σi2 and random variables |Xi| have moments of some order (2 + δ), the rate of growth of these moments is limited by the Lyapunov condition: Sn2 = Σ σi2 i =1.2,... n If for some δ >0 Lyapunov’s condition: lim n→∞1/ Sn2+δ Σ E| Xi –μi |2+δ = 0 is satisfied then the sum   (Xi –μi)/ Sn converges to  standard normal variate distribution as: n → ∞

5. Lindeberg Levy Theorem

Lindeberg and Levy proved the following case of Central Limit Theorem -

 “If X1,X2,X3…….Xn are  independently and identically distributed random variables with          E(Xi) = μi         V(Xi)  = σ12 Then Sn = X1 + X2 + X3 + X4 ……...Xn is asymptotically normal with mean μ = nμ1 and σ2= nσ12µ

Here the assumptions made are -

• The variables are independently and identically distributed
• E (Xi2) exists for all i=1,2,3,4...n.

Proof: Let M1 (t) denote the M.G.F of each of the deviation (Xi – μ1) and M (t) denote the M.G.F of standard normal variate z= (Sn- μ)/σ

Since, μ1’ and μ2’ (about origin) of the deviation (Xi – μ1) are given by,

μ1’ = E (Xi- μ2) = 0

Μ2’ = E (Xi- μ2) = σ12

We have,

M1(t)= ( 1 +μ1’ + μ2’ t2/2 + μ3‘ t3/3! +.....  ) = {1 + σ12t2/2 +O (t3)}

Where O3 contains terms with t3 and higher powers of t.

We have,

And since, Xi’s are independent, we get

MZ (t) = MΣ (Xi - µi) σ 2(t) = MΣ (Xi - µi) 2(t/σ) = ∏M (xi - µi) 2(t/σ) = [M1 (t/σ)]n = [M1 (t/√nσ1)]

= [1 + t2/2n + o (n-3/2)]n

Where o(n-3/2) represents terms involving n3/2  and higher powers of n in the denominator proceeding to the limits as limit and n tends to infinity,

=> limn→∞ MZ (t) = limn→∞ [1+ t2/n + o (n-3/2)]n

=> limn→∞ [1+ t2/2n]n= et2/2

Which is M.G.F of standard normal variate. Hence by uniqueness of M.G.F’s Sn = X1 + X2 + X3 + X4 ……...Xn is asymptotically normal with mean μ = nμ1 and σ2= nσ12

If a sequence satisfies Lyapunov’s condition it also satisfies Lindeberg’s condition. However, the converse is not true.

6. Liapounoff’s Central Limit Theorem

This is the Central Limit Theorem for generalized case when the variables are identically distributed and some further conditions are imposed.

 Let X1, X2, X3… Xn be independent random variables such that E(Xi)= μi V(Xi)= σi2

Suppose that the third absolute moment say pi3 of Xi about its mean exists i.e.

pi3 = E{| Xi- μi|3} is finite. i = 1,2,3...n

Further, let p3 = ∑pi3

If lim n→∞ p/σ = 0, the sum   X =X1 + X2 + X3 + X4 ……...Xn is asymptotically N (μ,σ2)

Where μ=∑μi and σ2 = ∑σi2

If the variables X1, X2, X3… Xn are identical, then p3 = ∑pi3 = n13

And σ2 = ∑σi2 = nσ12

Hence, for identical variables, the conditions of Liapounoff’s theorem are satisfied.

It may be pointed out that Lindebergh- Levy theorem is not a particular case of Liapounoff’s theorem since the former does not assume the existence of the third moment.

7. Different forms of C.L.T

C.L.T can be stated in other forms too which are as follows:

Remarks:

• It does not matter if we change non-strict inequalities to strict inequalities P [a< (.)
• CLT gives a good approximation in binomial case when p=½. For p near about 0 or 1, CLT approximation still holds good, but for that ‘n’ has to be sufficiently large.

8. Applications of Central Limit Theorem

(a) If X1, X2, X3.. are i.i.d B(r, p) variates and Sn = X1 + X2 + X3 + X4 ……...Xn

Then, E (Sn) = E (Xi) = nrp

And V (Sn) = V (Xi) = nrpq

Where i=1,2..n

Hence,

(b) Let X1, X2… be i.i.d Bernoulli variates i.e . B(1,p), then

Sn = X1 + X2 + X3 + X4 ……...Xn = B (n, p). Hence, we get in (*)

(c) If Yn is distributed as Pn then,

Thus, for instance

limn→∞ P (Yn ≤ n) = ½

i.e. Σe-nnk/k! = ½ as n

Proof: Let X1, X2… be i.i.d P (1) variates. Then, Sn =X1 + X2 + X3 + X4 ……...Xn~ P(n)

=> Yn = Sn

In particular, take a = - ∞

b=0

We get,

P (Yn ≤ n) ½ as n

Relationship between Central Limit Theorem and Weak Law of Large Numbers

1. Both the Central Limit Theorem and the Weak Law of Large Numbers (WLLN) hold good for the sequence of i.i.d random variables with finite mean μ and variance σ2. However in this case the CLT gives a stronger result then the WLLN in the sense that the former provides an estimate of P [|(Sn -nμ)/n| ] as given below:

where Φ (.) is the distribution function of standard normal variate. However, WLLN does not require the existence of variance.

2. For the sequence {Xn} of independent and uniformly bounded r.v.’s, WLLN holds good and CLT holds in this case provided Bn= var( X1 + X2 + X3 + X4……...Xn) = σ12 + σ22……….σn2

as n ∞.

3. For the sequence {Xn} of independent r.v.’s, CLT may hold good, but the WLLN may not hold well.

9. Uses of CLT

• CLT explains that a lot of commonly used estimators follow an "Approximately Normal" distribution which means tables of values (built-in functions in statistical software or programming languages like ‘R’, ‘C- programming’) can be used to construct or build confidence intervals and approximate p-values. It is very practical.
• CLT’s ability can be applied to an kind of distributions. It allows statisticians to develop standardized methods to derive much useful information from almost any sample by obtaining CLT based statistics and hypothesis tests.
• Under certain conditions, CLT also holds good for variables which are not independent.

10. Examples of Central Limit Theorem

1. Suppose a school has 1200 students, with 200 each in grades from 7-12th standard. Here, each student has an effect on marks and each student’s marks are independent of each other.

If we take a sample of 25 students each for their marks and take total 10 samples and find the mean grade. We observe that first sample has 9.52, then, we find other sample’s mean is 932 hence, the table showing grades of different students are shown below:

 Sample (n = 25) Average Grade 1 9.52 2 9.32 3 9.08 4 8.80 5 9.48 6 9.36 7 9.48 8 10.12 9 9.64 10 9.35

When samples are taken and means are calculated, each time the mean, starts to form their own distribution.  This distribution is the sampling distribution because it represents the distribution of estimates from population on repeated samples.  In such a case, a histogram of sample means, of say, 1,000 samples would appear like the following.

The shape of the distribution of 25 samples looks a bit like Gaussian distribution (Normal Distribution), regardless of the original distribution being uniform and the shape of the sample means taken from the population tend more towards normal distribution as we keep increasing ‘n’.

The central limit theorem states that, it is easy to show that the mean of this sampling distribution will be the population mean, and that the variance is equal to the population variance divided by n, taking square root of the variance, given standard deviation of population, which is known as standard error. To conclude, this example depicts that the mean of the sample means will be equal to the population means, and the variance will get smaller with

i) Decrease in population variance or

ii) With the increment in sample size.

2. Let X1, X2.. be  i.i.d Poisson variates with parameter λ. Use central limit theorem to estimate that:

P (120 ≤ Sn ≤ 160) where Sn = X1+ X2+ X3… Xn = λ 2 and n=75.

Solution:

Since, Xi’s are i.i.d P (λ),

E (Xi) = λ

Var (Sn)= var ( X1+X2+X3… Xn) = n λ

Hence, by Lindeberg -CLT (for large n)

Sn ~ N (n λ, n λ) = N (µ = 150, σ2 = 150); n=75 λ = 2

3. The probability distribution for total distance covered in a walk (biased or unbiased) will tend towards a normal distribution.

4. Flipping coins for a large n will result in a normal distribution for the total of heads (or equivalently total number of tails).

Try yourself:

• A distribution with unknown mean μ has variance equal to 1.5. Use central limit theorem to find how large a sample should be taken from the distribution in order that the probability will be at least 0.95 that the sample mean will be within 0.5 of the population mean?
• The lifetime of a certain brand of an electric tube light is considered a random variable with mean 1,200 hours and standard deviation 250 hours. Find the probability using central limit theorem showing that the average lifetime of 60 bulbs exceeds 1400 hours.

Sir Francis Galton was an English Victorian statistician, progressive, polymath, sociologist, psychologist, anthropologist, eugenicist, tropical explorer, geographer, inventor, meteorologist, proto-geneticist, and psychometrician, knighted in 1909. He produced over 340 papers and books and created the statistical concept of correlation and widely promoted regression towards the mean. He was the first man to apply statistical methods to the study of human differences and inheritance of intelligence, and introduced the use of questionnaires and surveys for collecting data on human communities, which he needed for genealogical and biographical works and for his anthropometric studies. He quoted as:

"I know of scarcely anything so apt to impress the imagination as the wonderful form of cosmic order expressed by the “Law of Frequency of Error”. The law would have been personified by the Greeks and defied, if they had known of it. It reigns with serenity and in complete self-effacement, amidst the wildest confusion. The huger the mob, and the greater the apparent anarchy, the more perfect is its sway. It is the supreme Law of Unreason. Whenever a large sample of chaotic elements are taken in hand and marshaled in the order of their magnitude, an unsuspected and most beautiful form of regularity proves to have been latent all along".