Math ClubMath Club
v1 · padrão canônico

Lesson 108 — Chi-squared test: goodness of fit and independence

Chi-squared statistics: asymptotic distribution, degrees of freedom, goodness-of-fit test and independence test in contingency tables. Yates correction, Cramér's V.

Used in: 3rd year High School · Stochastik LK German · H2 Statistics Singapore · Math B Japanese — inferential statistics

χ2=i=1k(OiEi)2Ei\chi^2 = \sum_{i=1}^{k} \frac{(O_i - E_i)^2}{E_i}
Choose your door

Rigorous notation, full derivation, hypotheses

Rigorous definition

Chi-squared distribution

"Chi-squared distributions have an additivity property: if X1χn12X_1 \sim \chi^2_{n_1} and X2χn22X_2 \sim \chi^2_{n_2} are independent, then X1+X2χn1+n22X_1 + X_2 \sim \chi^2_{n_1 + n_2}." — OpenStax Statistics, §11.1

Goodness-of-fit test

Independence test in r×cr \times c table

"Expected frequencies for an independence test are calculated assuming that the row proportions are equal across all columns. If the null hypothesis is true (variables independent), this assumption is satisfied." — OpenIntro Statistics, §6.4

Assumptions for validity (Cochran's rule)

Yates correction (2×22 \times 2 table)

Effect size: Cramér's V

xf(x)critical regioncritical valuechi-squared df=5

Chi-squared curve with df = 5. The yellow region to the right of the critical value is the rejection region for H0 at level alpha = 5%.

Worked examples

Exercise list

42 exercises · 10 with worked solution (25%)

Application 23Understanding 8Modeling 7Challenge 2Proof 2
  1. Ex. 108.1Application

    A six-sided die is rolled 60 times. What is the number of degrees of freedom in the goodness-of-fit test for the uniform distribution?

  2. Ex. 108.2Application

    For the die from the previous exercise rolled 60 times, what is the expected frequency per face?

  3. Ex. 108.3Application

    A die is rolled 60 times: observing 12, 8, 11, 9, 13, 7 for faces 1 to 6. Calculate χ2\chi^2 and conclude at 5%.

  4. Ex. 108.4Application

    Calculate the degrees of freedom for the independence test in a 3×43 \times 4 contingency table.

  5. Ex. 108.5Application

    In a contingency table with R1=80R_1 = 80, C1=120C_1 = 120, n=300n = 300, calculate E11E_{11}.

  6. Ex. 108.6Application

    In an independence test, we obtained χ2=8\chi^2 = 8 with df=2df = 2. The critical value at 5% is 5.99. What is the conclusion?

  7. Ex. 108.7ApplicationAnswer key

    Calculate Cramér's V: χ2=20\chi^2 = 20, n=200n = 200, table 3×43 \times 4 (so min(r1,c1)=2\min(r-1,c-1) = 2).

  8. Ex. 108.8ApplicationAnswer key

    In which situations should the Yates correction be applied to the chi-squared test?

  9. Ex. 108.9ApplicationAnswer key

    A researcher has Ei=3E_i = 3 in two of five cells in a table. Is the chi-squared test appropriate? Justify.

  10. Ex. 108.10Application

    Observe O=(45,35,20)O = (45, 35, 20) in n=100n = 100 observations with expected proportions (0,5;  0,3;  0,2)(0,5;\; 0,3;\; 0,2). Calculate χ2\chi^2.

  11. Ex. 108.11Application

    For the previous exercise (3 categories, fully specified distribution), what is the number of degrees of freedom?

  12. Ex. 108.12Application

    A study measures blood pressure (high/normal) in the same group of patients before and after an exercise program. Why is the standard independence chi-squared test inadequate?

  13. Ex. 108.13Application

    What is the critical value χ0,05;12\chi^2_{0,05;\,1} (chi-squared with 1 degree of freedom at the 5% level)?

  14. Ex. 108.14Application

    Calculate the expected frequencies for the 2×22 \times 2 table with cells a=80a = 80, b=20b = 20, c=60c = 60, d=40d = 40.

  15. Ex. 108.15Application

    With the expected values from the previous exercise, calculate χ2\chi^2 and conclude at 5%.

  16. Ex. 108.16ApplicationAnswer key

    Why is df=1df = 1 in every 2×22 \times 2 contingency table? Explain geometrically or algebraically.

  17. Ex. 108.17Application

    What are the mean and variance of χk2\chi^2_k? For k=20k = 20, is the distribution approximately symmetric?

  18. Ex. 108.18Application

    In a goodness-of-fit test with kk categories, how do the degrees of freedom change when we estimate rr parameters of the distribution from the data itself?

  19. Ex. 108.19Application

    Show that the chi-squared statistic χ2=(OiEi)2/Ei\chi^2 = \sum (O_i - E_i)^2 / E_i is always non-negative.

  20. Ex. 108.20ApplicationAnswer key

    Is the goodness-of-fit chi-squared test one-tailed (right tail) or two-tailed? Why?

  21. Ex. 108.21Application

    O=(10,20,30,40)O = (10, 20, 30, 40) in n=100n = 100 with expected uniform distribution. Calculate χ2\chi^2 and conclude at 1%.

  22. Ex. 108.22Application

    What is the conceptual difference between homogeneity test and independence test? Does the formula for χ2\chi^2 change?

  23. Ex. 108.23Application

    χ2=14,5\chi^2 = 14,5 with df=5df = 5. What is the conclusion at 5% and at 1%? (Critical values: 11.07 and 15.09 respectively.)

  24. Ex. 108.24Understanding

    What would it mean to obtain χ2=0\chi^2 = 0 in a goodness-of-fit test? Is this possible in real data?

  25. Ex. 108.25Understanding

    Why do very large samples make χ2\chi^2 a problematic measure? What alternative should be used?

  26. Ex. 108.26UnderstandingAnswer key

    Describe the shape of the chi-squared curve for small dfdf (e.g. df=2df = 2) vs. large dfdf (e.g. df=20df = 20). How does this relate to its origin as a sum of squares?

  27. Ex. 108.27UnderstandingAnswer key

    Which formula below is Pearson's chi-squared statistic?

  28. Ex. 108.28Understanding

    Explain why Cochran's rule (Ei5E_i \geq 5) is necessary for the validity of the chi-squared test.

  29. Ex. 108.29ModelingAnswer key

    Dihybrid cross in peas predicts phenotypes in ratio 9:3:3:1. In 160 offspring observe 95, 30, 27, 8. Test goodness of fit at 5%.

  30. Ex. 108.30Modeling

    Survey of 400 university students (200 men, 200 women) tabulates opinion on quotas (Favorable/Neutral/Against): men 70/60/70, women 110/50/40. Test independence at 5%.

  31. Ex. 108.31Modeling

    A sample of 200 M&M's from a package shows: 30 red, 35 orange, 22 yellow, 40 green, 55 blue, 18 brown. According to the manufacturer, proportions are 13%, 20%, 14%, 16%, 24%, 13%. Test goodness of fit at 5%.

  32. Ex. 108.32Modeling

    A/B/C test on landing page: 200 visitors per variation. Conversions: A = 24, B = 30, C = 40. Test homogeneity of conversion rates at 5%.

  33. Ex. 108.33Modeling

    Four machines produce defects: 30, 40, 25, 35 defects respectively (total 130). Test whether the defect rate is uniform among machines at the 5% level.

  34. Ex. 108.34ModelingAnswer key

    Clinical trial with 50 patients (25 per group): vaccine resulted in 18 cures, placebo in 12 cures. Build the 2×22 \times 2 table and apply the chi-squared test with Yates correction at 5%.

  35. Ex. 108.35Modeling

    Do DNIT highway accident data follow a Poisson distribution? Describe the complete flowchart for the goodness-of-fit test, including how to handle the unknown parameter.

  36. Ex. 108.36Understanding

    Which condition below is necessary for the validity of the independence chi-squared test?

  37. Ex. 108.37Understanding

    In a before-after study, the same 80 patients are classified as hypertensive or normal before and after intervention. Why use McNemar instead of standard chi-squared?

  38. Ex. 108.38Understanding

    A survey of 500 Brazilians records region (North, Southeast, South) and payment preference (cash vs. installment). Which test is most appropriate to verify whether preference and region are independent?

  39. Ex. 108.39Challenge

    An emergency room recorded 210 visits in one week (30 per day expected). Observed: Sun=18, Mon=40, Tue=28, Wed=25, Thu=29, Fri=42, Sat=28. Is the flow uniform across days? Test at 5%.

  40. Ex. 108.40Challenge

    Electoral survey in 3 Brazilian states (SP, RJ, MG) with 600 voters (200 per state) records candidate preference (A, B, C). Data: SP=(80,70,50), RJ=(60,90,50), MG=(70,60,70). Test independence between state and candidate at 5% and calculate Cramér's V.

  41. Ex. 108.41ProofAnswer key

    Show that for k=2k = 2 categories, χ2=Z2\chi^2 = Z^2 where ZZ is the bilateral zz statistic for proportion test. This explains why χ0,05;12=(1,96)23,84\chi^2_{0,05;\,1} = (1,96)^2 \approx 3,84.

  42. Ex. 108.42Proof

    Prove the formula df=(r1)(c1)df = (r-1)(c-1) for the independence test in an r×cr \times c table, explaining how many independent constraints the margins impose on the count vector.

Sources

Updated on 2026-05-11 · Author(s): Clube da Matemática

Found an error? Open an issue on GitHub or submit a PR — open source forever.