Lesson 80 — Consolidation Term 8 — Applied Statistics and Probability
Integrative workshop: central measures, variance, quartiles, discrete r.v., binomial, normal, CLT, correlation, and Bayes in real-world problems.
Used in: 2.º ano do EM (16-17 anos) · Equiv. Stochastik LK alemão · Equiv. Math B japonês · Equiv. H2 Maths Statistics (Singapura)
Rigorous notation, full derivation, hypotheses
Formal synthesis of the term
Descriptive statistics
"Variance is the average of the squared deviations from the mean. For a sample, divide by (Bessel's correction) instead of ." — OpenIntro Statistics §2.1
Discrete random variable
"Expectation is a weighted average of the possible values of , weighted by their probabilities." — Grinstead & Snell §6.1
Parametric distributions
Central Limit Theorem
"The CLT is arguably the most important result in all of probability theory. It states that the distribution of the sample mean approaches normal regardless of the original distribution of ." — OpenIntro Statistics §4.4
Correlation and regression
Bayes' Rule
Term 8 Pipeline. Each block corresponds to a group of lessons (72–73, 74–76, 77, 78–79).
Solved examples
Exercise list
37 exercises · 9 with worked solution (25%)
- Ex. 80.1ApplicationAnswer key
Sample: 4, 6, 8, 8, 9, 10, 10, 11, 12, 22. Calculate median, , , IQR, and identify outliers by the Tukey fence.
- Ex. 80.2Application
Same sample as exercise 80.1. Calculate the mean and sample standard deviation. Compare with the median: which is more representative of the central position? Why?
- Ex. 80.3Application
. Calculate , , and .
- Ex. 80.4Application
For : verify the normal approximation criterion and, even if borderline, use the approximation with continuity correction to estimate .
- Ex. 80.5Application
. Calculate .
- Ex. 80.6Application
Sample from population with , . Calculate .
- Ex. 80.7Application
Pairs : (1, 2), (2, 4), (3, 5), (4, 4), (5, 7). Calculate Pearson correlation coefficient and regression line .
- Ex. 80.8Application
Disease with 2% prevalence, test with 90% sensitivity and 95% specificity. Calculate Positive Predictive Value (PPV).
- Ex. 80.9Application
where . Determine the distribution of .
- Ex. 80.10ApplicationAnswer key
A fair die is rolled 50 times. Calculate the expectation and standard deviation of the sum .
- Ex. 80.11Understanding
Which statement about dispersion measures is correct?
- Ex. 80.12UnderstandingAnswer key
Which relationship is true for the point probabilities of the binomials at the mode?
- Ex. 80.13Understanding
Explain in your own words: does the CLT state that, for large , individual data points follow a normal distribution? If not, what exactly converges to normal?
- Ex. 80.14UnderstandingAnswer key
Which statement about Pearson correlation is correct?
- Ex. 80.15Modeling
Pizza delivery time: min. What is the maximum deadline covering 95% of deliveries?
- Ex. 80.16ModelingAnswer key
With the model from exercise 80.15 (SLA of 40.2 min, 5% violation rate), calculate the expectation of the number of violations in 100 deliveries.
- Ex. 80.17Modeling
Real estate market: correlation between area and price is . Means , (Rs_x = 20s_y = 80,000$. Find the regression line and predict the price for an average-area property.
- Ex. 80.18Modeling
Financial portfolio: 100 independent stocks, each with daily return . Determine the distribution of the daily return of the equal-weighted portfolio.
- Ex. 80.19Modeling
Six Sigma: parts with dimension mm. Tolerance mm. Calculate the proportion of defects and estimate defects per million.
- Ex. 80.20Modeling
Election poll: , . Construct a 95% confidence interval for the true proportion .
- Ex. 80.21Modeling
Spam filter: 80% of spam contains "FREE", 5% of ham contains it. . An email contains "FREE" — apply Bayes and classify.
- Ex. 80.22ModelingAnswer key
Production line: 2% defect rate, batch of 200 parts. Estimate via Poisson approximation and normal approximation. Compare results.
- Ex. 80.23Modeling
Financial portfolio: assets A () and B () with correlation . Calculate the standard deviation of a 50%/50% portfolio.
- Ex. 80.24Modeling
Two independent diagnostic tests, both positive: test 1 (sens 90%, spec 95%), test 2 (sens 85%, spec 90%). Prevalence 1%. Apply Bayes sequentially and calculate final PPV.
- Ex. 80.25Modeling
Vaccine clinical trial: 100 vaccinated, 5 sick; 100 placebo, 25 sick. Calculate vaccine efficacy and evaluate (informally) if the difference is statistically significant.
- Ex. 80.26Modeling
Call center: in each minute, each of 120 agents receives a call with 2% probability. Model the number of simultaneous calls in 1 minute and calculate .
- Ex. 80.27Modeling
Heights of adult men in Brazil: cm, cm. What percentage does not pass through a 180 cm door? What door height covers 99% of the male population?
- Ex. 80.28Challenge
Explain, with a numerical example, why in highly right-skewed distributions the median is more informative than the mean, and IQR more informative than standard deviation.
- Ex. 80.29ChallengeAnswer key
Describe intuitively and mathematically how Bayesian inference converges to frequentist inference (MLE) as . Which theorem formalizes this convergence?
- Ex. 80.30ChallengeAnswer key
Construct a data example where but the relationship between and is entirely explained by a confounder . Explicit the mathematical mechanism.
- Ex. 80.31Challenge
Generate (theoretically) 100 independent random variables . Use CLT to approximate .
- Ex. 80.32Challenge
ENEM: public school has , (Math); private school has , . Samples of from each. What is the probability that the private sample mean exceeds the public one by more than 80 points?
- Ex. 80.36Proof
Prove that from the definition .
- Ex. 80.37Proof
Show that if are iid, then . Conclude that by the Law of Large Numbers.
- Ex. 80.38ProofAnswer key
Prove that and for any .
- Ex. 80.39Proof
Derive Bayes' rule from the definition of conditional probability and the law of total probability.
- Ex. 80.40Proof
State the CLT formally. Sketch the proof via characteristic function (indicate the steps, justifying Lévy's continuity theorem is not required).
Sources
- OpenIntro Statistics (4th ed.) — Diez, Çetinkaya-Rundel, Barr · CC-BY-SA · Primary source for the term.
- OpenStax — Statistics — Illowsky, Dean · CC-BY · Contextualized application exercises.
- Grinstead & Snell — Introduction to Probability — GNU FDL · Theoretical rigor for discrete r.v., LLN, and CLT.