Lesson 103 — Hypothesis testing: structure and logic
Formal structure of hypothesis testing: H0 vs H1, test statistic, p-value, significance level, Type I and II errors, and test power.
Used in: 3.º ano do EM (17-18 anos) · Equiv. Stochastik LK alemão · Equiv. Math B japonês · H2 Statistics singapurense
Rigorous notation, full derivation, hypotheses
Rigorous definition
The five elements of a hypothesis test
"The null hypothesis represents a claim of skepticism. It is the status quo that would be maintained unless there is sufficient evidence against it." — OpenIntro Statistics, §5.1
Errors and test power
Formal definition of the p-value
"The p-value measures how consistent the data are with . A small p-value indicates that the data are incompatible with — not that is false with probability ." — OpenIntro Statistics, §5.1
Types of alternative hypothesis
Solved examples
Exercise list
26 exercises · 6 with worked solution (25%)
- Ex. 103.1ApplicationAnswer key
Formulate the hypotheses and for the following scenario: a consumer protection agency wants to verify if the average weight of a 500 g package of flour is compliant with the declaration.
- Ex. 103.2Application
Researchers want to verify if teenagers sleep less than the recommended 8 hours per night. Formulate and .
- Ex. 103.3Application
, . Data: , , (known). Calculate the z-statistic and the p-value. Conclude for .
- Ex. 103.4Application
A manufacturer claims its bulbs last on average 1000 h. A sample of bulbs gives h with h (known). At the 5% level, is the average lifespan less than claimed?
- Ex. 103.5Application
In a criminal trial, is "the defendant is innocent" and is "the defendant is guilty". Describe Type I and Type II Errors in this context. Which is considered more serious in the legal system? Why?
- Ex. 103.6Understanding
A test results in . Which of the statements below is correct?
- Ex. 103.7Understanding
A test with results in . The researcher concludes "the effect does not exist". What might be wrong?
- Ex. 103.8Application
A school implemented a new methodology. The historical average grade is points. After intervention, students had and (known). At the 5% level, did the grade improve?
- Ex. 103.9Application
A clinic wants to detect a 5 min reduction in service time (, ). With and 90% power, what is the minimum ?
- Ex. 103.10ApplicationAnswer key
A coin is flipped 100 times and gets 60 heads. At the 5% level, is the coin fair?
- Ex. 103.11Application
A researcher changes the significance level from to while keeping fixed. Explain the effect on Type II Error and test power.
- Ex. 103.12ApplicationAnswer key
Normal fasting blood glucose: mg/dL. A sample of diabetics gives mg/dL with mg/dL. At the 1% level, is average blood glucose elevated?
- Ex. 103.13Understanding
A result is "statistically significant at 5%". What does this correctly mean?
- Ex. 103.14Application
A company wants to detect if the average weight of its products dropped from g to g, with g, and 80% power. What is the minimum ?
- Ex. 103.15Application
A genomics study performs 1000 simultaneous tests with . All tested genes are null (no real effect). How many false positives are expected? If 60 genes are "significant", what is the estimated false discovery rate?
- Ex. 103.16Application
A coin is flipped 800 times and gets 384 heads. At the 5% level, is the coin fair?
- Ex. 103.17ApplicationAnswer key
A survey with teenagers recorded an average sleep of h with h (from previous studies). At the 5% level, do they sleep less than 8 hours?
- Ex. 103.18UnderstandingAnswer key
Which of the statements about statistical significance is correct?
- Ex. 103.19Modeling
A clinical trial tests 20 endpoints simultaneously with . What is the probability of at least one false positive without correction? Describe how Bonferroni correction solves the problem and discuss its limitation.
- Ex. 103.20Application
The historical ENEM approval rate of a school is 30%. After a new methodology, 38 out of 100 students passed. At the 5% level, did the rate improve?
- Ex. 103.21Application
Test vs with and . Calculate the p-value for and . What does this reveal about the p-value and effect size?
- Ex. 103.22ApplicationAnswer key
Normal systolic pressure: mmHg. Sample of sedentary adults: mmHg, mmHg. At the 1% level, is average pressure elevated?
- Ex. 103.23Application
A veterinary study wants to detect that the average weight of pigs of a breed changed from 125 kg to 120 kg (, ). With two-tailed and 80% power, how many animals are needed?
- Ex. 103.24Modeling
A school's ENEM has points against state average, with and students. The result is "highly significant" (). Calculate Cohen's effect size . Is the 2-point difference educationally relevant? Discuss.
- Ex. 103.25Challenge
Show that, under true, the p-value has a Uniform(0,1) distribution for continuous tests. Use this result to verify that .
- Ex. 103.26Proof
Use the Neyman-Pearson Lemma to show that the one-tailed z-test (reject if ) is the most powerful level test for vs with normal data and known .
Sources
- OpenIntro Statistics (4th ed.) — Diez, Çetinkaya-Rundel, Barr · CC-BY-SA. Sections §5.1–5.3 (test structure, p-value, power, sample size).
- Statistics (OpenStax) — Illowsky, Dean · CC-BY. Chapter 9 (null and alternative hypotheses, Type I and II errors, complete examples with z).
- Statistical Thinking for the 21st Century — Russell Poldrack · CC-BY-NC. Chapters 10–11 (replicability crisis, responsible use of p-value, FDR, effect size).