Math ClubMath Club
v1 · padrão canônico

Lesson 77 — Central Limit Theorem

The mean of n iid random variables converges to a normal distribution regardless of the original distribution — the most important law in statistics. Proof via characteristic function, Berry-Esseen bound, and inference applications.

Used in: 2.º ano do EM (16-17 anos) · Math B japonês §4.4 · Stochastik LK alemão · H2 Math singapurense cap. 21

XˉndN ⁣(μ,σ2n)(n)\bar X_n \xrightarrow{d} \mathcal{N}\!\left(\mu,\,\frac{\sigma^2}{n}\right) \quad (n \to \infty)
Choose your door

Rigorous notation, full derivation, hypotheses

Formal statement and proof

Lindeberg-Lévy version

"The central limit theorem is the unofficial sovereign of probability theory." — Grinstead & Snell, Introduction to Probability, §9.1

Version for sums

If Sn=X1++XnS_n = X_1 + \cdots + X_n, then SnN(nμ,nσ2)S_n \approx \mathcal{N}(n\mu,\, n\sigma^2) for large nn.

Zn=SnnμσndN(0,1)Z_n = \frac{S_n - n\mu}{\sigma\sqrt{n}} \xrightarrow{d} \mathcal{N}(0,1)
what this means · Standardization of the sum: same formula, different scale.

Convergence speed: Berry-Esseen inequality

Proof sketch via characteristic function

Let Yi=(Xiμ)/σY_i = (X_i - \mu)/\sigma (zero mean, unit variance). Taylor expansion of φYi\varphi_{Y_i}:

φYi(t)=1t22+o(t2)(t0).\varphi_{Y_i}(t) = 1 - \frac{t^2}{2} + o(t^2) \quad (t \to 0).

For Zn=(Y1++Yn)/nZ_n = (Y_1 + \cdots + Y_n)/\sqrt{n}:

φZn(t)=[φYi ⁣(tn)]n=[1t22n+o ⁣(1n)]nnet2/2.\varphi_{Z_n}(t) = \left[\varphi_{Y_i}\!\left(\frac{t}{\sqrt{n}}\right)\right]^n = \left[1 - \frac{t^2}{2n} + o\!\left(\frac{1}{n}\right)\right]^n \xrightarrow{n\to\infty} e^{-t^2/2}.

But et2/2e^{-t^2/2} is the characteristic function of N(0,1)\mathcal{N}(0, 1). The Lévy's Continuity Theorem concludes ZndN(0,1)Z_n \xrightarrow{d} \mathcal{N}(0,1). \blacksquare

When the CLT does not hold

Essential hypotheses

  • Independence (minimum sufficient; relaxable to α\alpha-mixing).
  • Finite variance σ2<\sigma^2 < \infty.
  • n sufficiently large — rule of thumb: n30n \geq 30 for not-too-skewed distributions; n100n \geq 100 for high skewness.

Solved examples

Exercise list

37 exercises · 9 with worked solution (25%)

Application 20Understanding 4Modeling 9Challenge 2Proof 2
  1. Ex. 77.1Application

    XX exponential with μ=1\mu = 1 and σ=1\sigma = 1. Write the approximate distribution of Xˉ100\bar X_{100} and calculate σXˉ\sigma_{\bar X}.

  2. Ex. 77.2Application

    XX uniform on [0,1][0, 1]. Determine μ\mu and σ2\sigma^2, and write the approximate distribution of Xˉ50\bar X_{50} by the CLT.

  3. Ex. 77.3ApplicationAnswer key

    Roll 100 fair dice. Determine the approximate distribution of the sum S100S_{100}, stating E[S]E[S] and Var(S)\text{Var}(S).

  4. Ex. 77.4Application

    XBernoulli(0,3)X \sim \text{Bernoulli}(0{,}3). Write the approximate distribution of Xˉ200\bar X_{200} by the CLT and calculate the standard deviation of the sample proportion.

  5. Ex. 77.5Application

    A population has μ=50\mu = 50 and σ=10\sigma = 10. For n=25n = 25, calculate the standard deviation of Xˉ\bar X (as an integer).

  6. Ex. 77.6ApplicationAnswer key

    Using the data from 77.5 (μ=50\mu = 50, σ=10\sigma = 10, n=25n = 25), calculate P(Xˉ>53)P(\bar X > 53).

  7. Ex. 77.7Application

    With the same parameters (μ=50\mu = 50, σ=10\sigma = 10, n=25n = 25), calculate P(Xˉ<47)P(\bar X < 47).

  8. Ex. 77.8Application

    XX with μ=100\mu = 100, σ=20\sigma = 20, n=100n = 100. Calculate P(98<Xˉ<102)P(98 < \bar X < 102).

  9. Ex. 77.9Application

    Sum of 50 iid r.v. with μ=5\mu = 5, σ=2\sigma = 2. Calculate P(S50>270)P(S_{50} > 270).

  10. Ex. 77.10Application

    XX with μ=10\mu = 10, σ=3\sigma = 3. How many observations nn for a 95% CI with margin of error ±0,5\pm 0{,}5?

  11. Ex. 77.11Understanding

    When sample size nn is multiplied by 4, the standard deviation of Xˉ\bar X (=σ/n= \sigma/\sqrt{n}):

  12. Ex. 77.12Understanding

    XX has a very skewed distribution (skewness = 3). For what size of nn is the CLT reasonable?

  13. Ex. 77.13Application

    Grades with μ=70\mu = 70, σ=15\sigma = 15. Sample n=36n = 36. Calculate P(Xˉ>75)P(\bar X > 75).

  14. Ex. 77.14Application

    With the same parameters as 77.13 (μ=70\mu = 70, σ=15\sigma = 15, n=36n = 36), calculate P(Xˉ<65)P(\bar X < 65).

  15. Ex. 77.15Application

    With μ=70\mu = 70, σ=15\sigma = 15, n=36n = 36 and Xˉ=72\bar X = 72, construct a 95% CI for μ\mu.

  16. Ex. 77.16ApplicationAnswer key

    Package weight: μ=500\mu = 500 g, σ=50\sigma = 50 g. Sample n=25n = 25. Calculate P(Xˉ>520)P(\bar X > 520).

  17. Ex. 77.17ApplicationAnswer key

    With the parameters from 77.16 (μ=500\mu = 500 g, σ=50\sigma = 50 g, n=25n = 25), calculate P(485<Xˉ<515)P(485 < \bar X < 515).

  18. Ex. 77.18Application

    Response time: μ=50\mu = 50 ms, σ=10\sigma = 10 ms. Mean of 100 measurements. What is the 95% SLA limit?

  19. Ex. 77.19Application

    Roll a die 1,000 times. Calculate P(Xˉ>3,6)P(\bar X > 3{,}6).

  20. Ex. 77.20Application

    Using the distribution of the sum S1000S_{1000} of 1,000 die rolls, calculate P(S1000>3600)P(S_{1000} > 3600).

  21. Ex. 77.21Application

    XExp(1)X \sim \text{Exp}(1) (μ=1\mu = 1, σ=1\sigma = 1). Calculate P(Xˉ100>1,1)P(\bar X_{100} > 1{,}1).

  22. Ex. 77.22ApplicationAnswer key

    Election poll: p=0,40p = 0{,}40, n=1000n = 1000. Calculate P(p^>0,43)P(\hat p > 0{,}43).

  23. Ex. 77.23ModelingAnswer key

    You hold 50 independent stocks; daily return of each: μ=0,1%\mu = 0{,}1\%, σ=2%\sigma = 2\%. What is the distribution of the average daily return of the portfolio?

  24. Ex. 77.24ModelingAnswer key

    ML model: individual error σ=0,5\sigma = 0{,}5. Calculate the standard deviation of the mean error over 1,000 predictions.

  25. Ex. 77.25Modeling

    Determine the sample size to detect a 5% difference in proportions with α=0,05\alpha = 0{,}05 and 80% power.

  26. Ex. 77.26Modeling

    Monte Carlo estimate of π\pi: nn random points in the square [0,1]2[0,1]^2, count those falling in the quarter-disk. What is the standard deviation of the estimate of π\pi as a function of nn?

  27. Ex. 77.27Modeling

    Batch of 500 parts: μ=100\mu = 100 g, σ=5\sigma = 5 g. Determine the distribution of the total mass S500S_{500}.

  28. Ex. 77.28Modeling

    Bus wait time: U[0,30]\mathcal{U}[0, 30] min. Calculate P(Tˉ50>16)P(\bar T_{50} > 16) for the average wait of 50 passengers.

  29. Ex. 77.29Modeling

    X-bar control chart with n=5n = 5. The control limits are Xˉ±3σXˉ\bar X \pm 3\sigma_{\bar X}. Calculate the interval width in terms of process σ\sigma.

  30. Ex. 77.30Modeling

    Satisfaction survey: margin of error ±3%\pm 3\% at 95% confidence, pp unknown. What is the minimum nn?

  31. Ex. 77.31ModelingAnswer key

    Call time: μ=3\mu = 3 min, σ=1,5\sigma = 1{,}5 min. 100 calls per hour. Determine the distribution of total time and calculate P(total>330 min)P(\text{total} > 330\text{ min}).

  32. Ex. 77.32ChallengeAnswer key

    A/B test: 10,000 visitors per variant; conversion rate A = 5%, B = 6%. Is the 1 percentage point lift statistically significant? Calculate the zz-value and pp-value.

  33. Ex. 77.33Understanding

    Which of the following correctly describes the Central Limit Theorem?

  34. Ex. 77.34Understanding

    Why does the classical Lindeberg-Lévy CLT not apply to the Cauchy distribution?

  35. Ex. 77.35Challenge

    Simulate the CLT in Python for an exponential distribution with λ=1\lambda = 1. Generate histograms of 10,000 sample means for n{1,5,30}n \in \lbrace 1, 5, 30 \rbrace and compare visually with the theoretical normal curve.

  36. Ex. 77.36Proof

    Sketch the proof of the CLT via characteristic function, indicating where each hypothesis (finite variance, iid) is used.

  37. Ex. 77.37Proof

    Show that the CLT implies the Weak Law of Large Numbers: if ZndN(0,1)Z_n \xrightarrow{d} \mathcal{N}(0,1), then XˉnPμ\bar X_n \xrightarrow{P} \mu.

Sources

  • OpenIntro Statistics (4th ed) — Diez, Çetinkaya-Rundel, Barr · 2019 · CC-BY-SA. Primary source for exercises 77.2, 77.4, 77.8, 77.11, 77.14–17, 77.22–23, 77.25–26, 77.28, 77.30, 77.33–34.
  • OpenStax Statistics — Illowsky, Dean · 2022 · CC-BY. Source for exercises 77.1, 77.3, 77.5–7, 77.9–10, 77.12–13, 77.18–19, 77.21, 77.24, 77.27, 77.29, 77.31, 77.35 and examples 1–3.
  • Introduction to Probability (Grinstead-Snell) — Grinstead, Snell · Dartmouth · GNU FDL. Source for exercises 77.19–20, 77.26, 77.36–37 and example 5.

Updated on 2025-05-14 · Author(s): Clube da Matemática

Found an error? Open an issue on GitHub or submit a PR — open source forever.