Math ClubMath Club
v1 · padrão canônico

Lesson 79 — Deep Dive into Bayes' Theorem

Priors, posteriors, and sequential updating. Odds form, Beta-binomial conjugate prior, base rate fallacy, Naive Bayes. Applications in medical diagnosis, spam filtering, and ML.

Used in: Stochastik LK alemão · H2 Math Statistics singapurense · Math B japonês · Equiv. AP Statistics EUA

P(HE)=P(EH)P(H)P(E)P(H \mid E) = \frac{P(E \mid H)\,P(H)}{P(E)}
Choose your door

Rigorous notation, full derivation, hypotheses

Definitions and theorems

Conditional probability

"The conditional probability P(EF)P(E \mid F), the probability of EE given FF, expresses the probability of EE when we know that FF has occurred. It can be computed using the formula P(EF)=P(EF)/P(F)P(E \mid F) = P(EF)/P(F), assuming P(F)>0P(F) > 0." — Grinstead & Snell, Introduction to Probability, §4.1

Law of total probability

Bayes' theorem

"Bayes' Theorem is just a formula that comes from the definition of conditional probability. Yet it is extremely powerful, and is the key to understanding what it means to rationally revise your beliefs in light of new evidence." — OpenIntro Statistics 4e, §3.2

Odds form

Sequential updating

Beta-binomial conjugate prior

SVG — Bayes diagram in 2×2 table

2×2 Table — Positive Predictive ValueSick (prevalence p)Test + : sens · p(true positive)TPHealthy (1 − p)Test + : (1−spec)·(1−p)(false positive)FPPPV = TP / (TP + FP)= posterior P(sick | test+)Low prevalence → FP dominates → Low PPV (base rate fallacy)

Absolute frequency diagram. The PPV (Positive Predictive Value) is the Bayesian posterior P(sick | positive test). When prevalence is low, false positives outweigh true positives even with a high-quality test.

Solved examples

Exercise list

40 exercises · 10 with worked solution (25%)

Application 18Understanding 4Modeling 10Challenge 5Proof 3
  1. Ex. 79.1ApplicationAnswer key

    P(A)=0.3P(A) = 0.3, P(B)=0.5P(B) = 0.5, P(AB)=0.15P(A \cap B) = 0.15. Calculate P(AB)P(A \mid B).

  2. Ex. 79.2Application

    P(AB)=0.6P(A \mid B) = 0.6, P(B)=0.5P(B) = 0.5. Calculate P(AB)P(A \cap B).

  3. Ex. 79.3Application

    P(A)=0.1P(A) = 0.1, P(BA)=0.8P(B \mid A) = 0.8, P(BAˉ)=0.2P(B \mid \bar A) = 0.2. Calculate P(B)P(B).

  4. Ex. 79.4Application

    With the data from exercise 79.3, calculate P(AB)P(A \mid B).

  5. Ex. 79.5ApplicationAnswer key

    Disease with 0.5% prevalence. Diagnostic test: 95% sensitivity, 95% specificity. Calculate the PPV using frequencies in 10,000 people.

  6. Ex. 79.6ApplicationAnswer key

    Same data as exercise 79.5, but with 50% prevalence. Calculate the PPV and compare with the previous result.

  7. Ex. 79.7Application

    Spam filter: P(spam)=0.3P(\text{spam}) = 0.3. Word "FREE" appears in 60% of spams and 5% of legitimate emails. Calculate P(spamFREE)P(\text{spam} \mid \text{FREE}).

  8. Ex. 79.8Application

    Urn A: 2 red, 3 blue. Urn B: 5 red, 1 blue. An urn is chosen at random and a red ball is drawn. What is the probability the urn is A?

  9. Ex. 79.9ApplicationAnswer key

    3 coins: 2 fair, 1 double-headed. One is chosen at random, flipped once, comes up heads. What is the probability the chosen coin is the double-headed one?

  10. Ex. 79.10Application

    P(smoker)=0.2P(\text{smoker}) = 0.2. P(cancersmoker)=0.1P(\text{cancer} \mid \text{smoker}) = 0.1. P(cancer¬smoker)=0.01P(\text{cancer} \mid \neg\text{smoker}) = 0.01. Given a person has cancer, what is the probability they are a smoker?

  11. Ex. 79.11Application

    Sequential updating: two positive tests with 90% sensitivity and 90% specificity, applied to a disease with 1% prevalence. Use the posterior of the 1st test as the prior of the 2nd. What is the PPV after both consecutive positive results?

  12. Ex. 79.12Application

    For a test with 90% sensitivity and 95% specificity, calculate the positive likelihood ratio LR+=sens/(1spec)\text{LR}^+ = \text{sens}/(1 - \text{spec}).

  13. Ex. 79.13Application

    Prior odds of 1:99 (1% prevalence). LR+=18\text{LR}^+ = 18 (90% sensitivity, 95% specificity). Calculate the posterior odds and the posterior.

  14. Ex. 79.14Application

    Which of the following values is the correct posterior in a context with prior odds 1:99 and LR+=18\text{LR}^+ = 18?

  15. Ex. 79.15Application

    Prior θBeta(2,2)\theta \sim \text{Beta}(2, 2). 7 heads observed in 10 flips. Determine the posterior.

  16. Ex. 79.16Application

    Prior θBeta(1,1)\theta \sim \text{Beta}(1, 1) (uniform). 0 heads observed in 5 flips. Determine the posterior and its mean.

  17. Ex. 79.17Application

    In exercise 79.15, what is the posterior mean?

  18. Ex. 79.18Application

    Prior θBeta(2,8)\theta \sim \text{Beta}(2, 8). New batch: 30 parts inspected, 6 defective. Determine the posterior and posterior mean.

  19. Ex. 79.19ModelingAnswer key

    COVID-19 in endemic phase: 5% prevalence. Rapid test: 80% sensitivity, 95% specificity. Calculate the PPV using frequencies in 10,000 people. Is it worth automatically isolating all positives?

  20. Ex. 79.20Modeling

    Naive Bayes for email: P(spam)=0.3P(\text{spam}) = 0.3. In training: "FREE" appears in 60% of spams and 5% of hams; "won" appears in 50% of spams and 10% of hams. An email contains both words. Classify assuming conditional independence.

  21. Ex. 79.21Modeling

    Three diseases: A (10% in population), B (5%), C (1%). Patient presents symptom S with P(SA)=0.3P(S|A) = 0.3, P(SB)=0.9P(S|B) = 0.9, P(SC)=0.9P(S|C) = 0.9. Which disease is most likely?

  22. Ex. 79.22Modeling

    Prosecutor's fallacy: DNA evidence has a frequency of 1/1000 in the population. The prosecutor claims the probability of innocence is 1/1000. Why is this reasoning wrong? Calculate the correct posterior assuming there are 100,000 plausible suspects in the city.

  23. Ex. 79.23ModelingAnswer key

    Fraud classifier: 95% sensitivity, 99.9% specificity. Frauds: 0.1% of transactions. Calculate the PPV. How many false positives for every true positive?

  24. Ex. 79.24Modeling

    Pregnancy test: 99% sensitivity, 98% specificity. Woman with prior probability of pregnancy of 30%. Calculate the PPV.

  25. Ex. 79.25ModelingAnswer key

    Polygraph: 70% sensitivity, 80% specificity. In interrogation with a suspect who has a 5% prior of guilt. Calculate the posterior after a positive result. Is the result admissible as sufficient evidence to convict?

  26. Ex. 79.26ModelingAnswer key

    Two independent positive tests (sens1_1 = 0.9, spec1_1 = 0.95; sens2_2 = 0.85, spec2_2 = 0.90). Prevalence 2%. Calculate the posterior after both positive results via sequential updating.

  27. Ex. 79.27Modeling

    In a lineup, one suspect has red hair (H) with a 70% probability of being the culprit. A witness identifies the red-haired one with 90% probability when the culprit is H, and erroneously 15% of the time when the culprit is not H. Given the witness pointed to H, what is the posterior of guilt?

  28. Ex. 79.28Modeling

    Quality control with 3 lines (A: 40% of production, 2% defect; B: 35%, 3%; C: 25%, 5%). A defective part is found. Determine the probability of each line being the origin.

  29. Ex. 79.29Understanding

    What is the base rate fallacy?

  30. Ex. 79.30Understanding

    Why does the prior matter even in "objective science"? An analysis that ignores the prior is equivalent to what implicit assumption?

  31. Ex. 79.31Understanding

    Two independent positive tests with likelihood ratios r1r_1 and r2r_2. What is the effect on the odds form?

  32. Ex. 79.32Understanding

    What is the practical difference between using a Beta(1,1) prior and a Beta(10,10) prior for a coin? In which case will the posterior be more sensitive to new data?

  33. Ex. 79.33Challenge

    Show that two conditionally independent positive tests given HH result in posterior odds equal to r1×r2×r_1 \times r_2 \times prior odds, where ri=LRi+r_i = \text{LR}_i^+.

  34. Ex. 79.34Challenge

    Demonstrate that the posterior of the Bernoulli-Beta model is Beta(α+k\alpha + k, β+nk\beta + n - k) when the prior is Beta(α\alpha, β\beta) and we observe kk successes in nn trials.

  35. Ex. 79.35Proof

    Demonstrate Bayes' theorem from the definition of conditional probability and the law of total probability.

  36. Ex. 79.36Proof

    Show that P(AB)=P(BA)P(A)/P(B)P(A \mid B) = P(B \mid A)\,P(A)/P(B) using only the definition of conditional probability. Identify why P(AB)P(BA)P(A \mid B) \neq P(B \mid A) in general.

  37. Ex. 79.37Challenge

    Monty Hall problem with 3 doors. Use Bayes to calculate the probability of the car being in each door after Monty (who knows where the car is) opens an empty door. Should you switch?

  38. Ex. 79.38ChallengeAnswer key

    In Naive Bayes with binary features, show that the classifier is equivalent to multiplying the individual LRs of each feature. What happens when the conditional independence assumption is violated?

  39. Ex. 79.39ProofAnswer key

    Demonstrate that the odds form of Bayes, posterior odds = LR ×\times prior odds, follows directly from the usual form of Bayes' theorem for two complementary events HH and ¬H\neg H.

  40. Ex. 79.40Challenge

    Show that the mean of the posterior Beta(α+k\alpha + k, β+nk\beta + n - k) converges to the maximum likelihood estimator k/nk/n when nn \to \infty, for any fixed prior Beta(α\alpha, β\beta). What does this imply about the relationship between Bayes and frequentism for large samples?

Sources

Updated on 2025-05-14 · Author(s): Clube da Matemática

Found an error? Open an issue on GitHub or submit a PR — open source forever.