Lesson 73 — Quartiles, percentiles, and boxplots
5-number summary: min, Q1, median, Q3, max. IQR, boxplots, and the 1.5 IQR rule for detecting outliers. Robust measures for skewed data.
Used in: Stochastik — Leistungskurs alemão · H2 Math Statistics — Singapura · AP Statistics — EUA · Math B — Japão
Rigorous notation, full derivation, hypotheses
Rigorous definition
Order statistics and percentiles
"The first quartile, , is the value such that 25% of the data fall below it, and the third quartile, , is such that 75% of the data fall below it." — OpenIntro Statistics §2.1
Boxplot anatomy: box (Q1 to Q3), median line, whiskers to the non-outlier extreme, isolated points for outliers.
Solved examples
Exercise list
40 exercises · 10 with worked solution (25%)
- Ex. 73.1ApplicationAnswer key
Data: 1, 3, 5, 7, 9. Calculate median, , and .
- Ex. 73.2Application
Data: 2, 4, 6, 8, 10, 12. Calculate the 5-number summary.
- Ex. 73.3ApplicationAnswer key
Grades: 4, 5, 6, 6, 7, 7, 8, 8, 9, 10. Calculate , , .
- Ex. 73.4Application
Calculate the of the data: 12, 14, 18, 22, 25, 28, 32.
- Ex. 73.5ApplicationAnswer key
Ages: 18, 20, 21, 22, 23, 24, 25, 27, 30, 35, 60. Apply the 1.5 IQR rule. Is there an outlier?
- Ex. 73.6Application
Salaries (IQR$.
- Ex. 73.7ApplicationAnswer key
For sorted data points, what is the position of by the linear interpolation method?
- Ex. 73.8Application
Times (s): 10, 11, 11, 12, 13, 13, 14, 14, 15, 100. Calculate Tukey limits and identify the outlier(s).
- Ex. 73.9Application
Weights (kg): 60, 62, 64, 65, 65, 67, 70, 72, 75, 80. Describe all elements of the boxplot.
- Ex. 73.10Application
For ,
- Ex. 73.11Application
Data with . Using the robust estimator , calculate .
- Ex. 73.12Application
How many points above would we expect in a sample of 1000 normal observations?
- Ex. 73.13Application
Boxplot A: narrow box, centered median. Boxplot B: wide box, median close to . Compare dispersion and symmetry of the two sets.
- Ex. 73.14Application
Distribution with a long right tail. Where is the mean in relation to the median?
- Ex. 73.15Application
Set A has , set B has . Which has more dispersion in the central data?
- Ex. 73.16Application
Median of . of , of . Which has a more right-skewed distribution?
- Ex. 73.17Application
of company salaries = $30k. Interpret this information.
- Ex. 73.18Application
A student is at the of the exam. What does this mean?
- Ex. 73.19Application
If , what can be concluded about the data?
- Ex. 73.20Understanding
Is the statement "the 1.5 IQR rule flags 5% of data as outliers" correct for normal data?
- Ex. 73.21ApplicationAnswer key
Ages (years): 40, 52, 55, 58, 62, 66, 72. Calculate the 5-number summary and check for outliers.
- Ex. 73.22ApplicationAnswer key
Grades of 10 students: 3, 5, 6, 7, 7, 8, 8, 9, 10, 10. Complete boxplot (with outlier check).
- Ex. 73.23Modeling
Class of 100 students: , . A student scored 9.5—are they in the top 25%?
- Ex. 73.24Modeling
Why do statistical agencies report median income, rather than just the mean, in inequality reports?
- Ex. 73.25Modeling
Parts produced with diameter: mm, mm. Specification: mm. Is the process centered? Is there significant risk of rejection?
- Ex. 73.26Modeling
A/B test of a site: variant A has median 1.2 s and ; variant B has median 1.1 s and . Which do you prefer for production? Justify using dispersion statistics.
- Ex. 73.27ModelingAnswer key
You detect an outlier in financial transactions that appears to be fraud. Should you remove it before analyzing the data? Justify with statistical arguments.
- Ex. 73.28Modeling
Response times (ms): 120, 130, 135, 140, 142, 145, 148, 150, 155, 380. Calculate the 5-number summary and evaluate if the system meets a 200 ms SLA based on quartiles.
- Ex. 73.29Modeling
Hospital with 4 wings. Stay times (days): Wing A: 5, 8, 9, 10, 12; Wing B: 3, 4, 4, 5, 20; Wing C: 7, 8, 8, 9, 10; Wing D: 2, 3, 15, 18, 25. Construct 5-number summaries and identify which wing is most predictable for bed management.
- Ex. 73.30Modeling
Exam scores by school. School A: median 650, . School B: median 620, . Which school has more uniform performance? What does each pattern suggest for pedagogical policy?
- Ex. 73.31Modeling
Average monthly precipitation (mm): 234, 181, 130, 83, 68, 52, 44, 47, 82, 122, 145, 201. Calculate the 5-number summary and interpret seasonality.
- Ex. 73.32Modeling
Real estate prices in a neighborhood ($k): 250, 280, 310, 320, 340, 350, 380, 390, 420, 1800. Calculate median and mean. Why should a buyer use the median as a reference for typical price?
- Ex. 73.33Understanding
Explain, in your own words, why median and IQR are "robust" while mean and standard deviation are not. Use a concrete example.
- Ex. 73.34UnderstandingAnswer key
Can a boxplot hide a bimodal distribution? Construct a concrete example of a bimodal distribution that has the same boxplot as a unimodal one.
- Ex. 73.35UnderstandingAnswer key
For , the is:
- Ex. 73.36Challenge
Calculate analytically the of . Express in terms of .
- Ex. 73.37Challenge
Argue why the breakdown point of the is 25%, the median is 50%, and the mean is 0%.
- Ex. 73.38ProofAnswer key
Demonstrate: if is a continuous r.v. with density symmetric around , then is the median of .
- Ex. 73.39Proof
Show that for and iid samples from Uniform(0,1), the sample estimator of converges to 0.25. Use properties of order statistics.
- Ex. 73.40Proof
Demonstrate that the median minimizes over all values .
Sources
- OpenIntro Statistics (4th ed) — Diez, Çetinkaya-Rundel, Barr · 2019 · EN · CC-BY-SA. Primary source — §2.1 (quartiles, percentiles) and §2.2 (boxplot, outliers).
- Statistics (OpenStax) — Illowsky, Dean · 2022 · EN · CC-BY. §2.3 (percentiles by interpolation) and §2.4 (boxplot and 1.5 IQR rule).
- Introduction to Probability (Grinstead-Snell) — Grinstead, Snell · 1997 · EN · GNU FDL. §5.1 — quartiles of continuous distributions, order statistics.