v1 · padrão canônico

Lesson 79 — Deep Dive into Bayes' Theorem

Priors, posteriors, and sequential updating. Odds form, Beta-binomial conjugate prior, base rate fallacy, Naive Bayes. Applications in medical diagnosis, spam filtering, and ML.

Used in: Stochastik LK alemão · H2 Math Statistics singapurense · Math B japonês · Equiv. AP Statistics EUA

P(H \mid E) = \frac{P(E \mid H)\,P(H)}{P(E)}

Choose your door

Rigorous notation, full derivation, hypotheses

Definitions and theorems

Conditional probability

"The conditional probability $P(E \mid F)$ , the probability of $E$ given $F$ , expresses the probability of $E$ when we know that $F$ has occurred. It can be computed using the formula $P(E \mid F) = P(EF)/P(F)$ , assuming $P(F) > 0$ ." — Grinstead & Snell, Introduction to Probability, §4.1

Bayes' theorem

"Bayes' Theorem is just a formula that comes from the definition of conditional probability. Yet it is extremely powerful, and is the key to understanding what it means to rationally revise your beliefs in light of new evidence." — OpenIntro Statistics 4e, §3.2

Beta-binomial conjugate prior

Definition· Conjugate prior

The prior $\pi(\theta)$ is conjugate to the likelihood $L(\theta \mid \mathbf{x})$ if the posterior $\pi(\theta \mid \mathbf{x})$ belongs to the same parametric family as the prior.

For the Bernoulli model: if $X_1, \ldots, X_n \overset{\text{iid}}{\sim} \text{Bernoulli}(\theta)$ and $k = \sum X_i$ , with prior $\theta \sim \text{Beta}(\alpha, \beta)$ :

$\theta \mid k \sim \text{Beta}(\alpha + k,\; \beta + n - k)$

Prior Beta(1,1) = uniform on $[0,1]$ (non-informative prior). Posterior mean: $(\alpha + k)/(\alpha + \beta + n)$ .

SVG — Bayes diagram in 2×2 table

Absolute frequency diagram. The PPV (Positive Predictive Value) is the Bayesian posterior P(sick | positive test). When prevalence is low, false positives outweigh true positives even with a high-quality test.

Solved examples

Example— 1· Direct calculation with law of total probability (application)

Problem: A factory has three production lines: A (40% of production), B (35%), and C (25%). Defect rates are: A: 2%, B: 3%, C: 5%. A part is randomly selected and is defective. What is the probability it was produced by line B?

Strategy: Apply the law of total probability to calculate $P(\text{defect})$ , then use Bayes' theorem to obtain $P(B \mid \text{defect})$ .

Resolution:

Define events: $H_A$ , $H_B$ , $H_C$ = part comes from line A, B, C. $D$ = defective part.
Priors: $P(H_A) = 0.40$ , $P(H_B) = 0.35$ , $P(H_C) = 0.25$ .
Likelihoods: $P(D \mid H_A) = 0.02$ , $P(D \mid H_B) = 0.03$ , $P(D \mid H_C) = 0.05$ .
Law of total probability: $P(D) = 0.02 \times 0.40 + 0.03 \times 0.35 + 0.05 \times 0.25 = 0.008 + 0.0105 + 0.0125 = 0.031$
Bayes for line B: $P(H_B \mid D) = \frac{0.03 \times 0.35}{0.031} = \frac{0.0105}{0.031} \approx 0.339$

Verification: Let's also calculate for A and C: $P(H_A \mid D) = 0.008/0.031 \approx 0.258$ ; $P(H_C \mid D) = 0.0125/0.031 \approx 0.403$ . Sum: $0.258 + 0.339 + 0.403 = 1.000$ . Correct.

Source. Grinstead & Snell — Introduction to Probability §4.1 — GNU FDL. (Problem adapted from the structure of Example 4.11, three causes with different likelihoods.)

Example— 2· Base rate fallacy and PPV (application)

Problem: Disease X affects 0.5% of the population. Diagnostic test: 95% sensitivity, 90% specificity. A patient tests positive. What is the positive predictive value?

Strategy: Use absolute frequencies in a sample of 10,000 people — method recommended by OpenIntro Statistics to avoid intuition errors.

Resolution:

In 10,000 people:

Sick: $10,000 \times 0.005 = 50$ .
True positives (TP): $50 \times 0.95 = 47.5 \approx 48$ (rounded).
Healthy: $10,000 - 50 = 9,950$ .
False positives (FP): $9,950 \times (1 - 0.90) = 9,950 \times 0.10 = 995$ .
Total positives: $48 + 995 = 1,043$ .
$\text{PPV} = 48/1043 \approx 4.6\%$ .

Verification: Via direct formula: $\text{PPV} = \frac{0.95 \times 0.005}{0.95 \times 0.005 + 0.10 \times 0.995} = \frac{0.00475}{0.00475 + 0.0995} = \frac{0.00475}{0.10425} \approx 4.6\%$

Matches. The intuition of "95% chance of being sick" is wrong by a factor of 20 — classic illustration of the base rate fallacy.

Source. OpenIntro Statistics 4e §3.2 — CC-BY-SA. (Structure of Example 3.10, medical diagnosis with low prevalence.)

Example— 3· Urn with two causes — classic form (application)

Problem: Urn A contains 3 red balls and 2 blue. Urn B contains 1 red ball and 4 blue. An urn is chosen at random (50%-50%) and a ball is drawn. The ball is red. What is the probability the chosen urn was A?

Strategy: Two hypotheses ( $H_A$ and $H_B$ ), uniform prior, evidence = red ball. Apply Bayes directly.

Resolution:

Priors: $P(H_A) = P(H_B) = 0.5$ .
Likelihoods: $P(R \mid H_A) = 3/5 = 0.60$ ; $P(R \mid H_B) = 1/5 = 0.20$ .
Total probability of red ball: $P(R) = 0.60 \times 0.50 + 0.20 \times 0.50 = 0.30 + 0.10 = 0.40$
Posterior: $P(H_A \mid R) = \frac{0.60 \times 0.50}{0.40} = \frac{0.30}{0.40} = 0.75$

Verification: $P(H_B \mid R) = 0.10/0.40 = 0.25$ . Sum $= 1$ . Urn A has red balls in a proportion 3 times higher than Urn B, so it makes sense that the posterior of A is 3 times that of B ( $0.75 = 3 \times 0.25$ ).

Source. Grinstead & Snell — Introduction to Probability §4.1 — GNU FDL. (Adapted from Exercise 4.1.1 on two urns.)

Example— 4· Sequential updating with two tests (intermediate)

Problem: Disease prevalence: 2%. Two independent tests: Test 1 with 90% sensitivity and 95% specificity; Test 2 with 85% sensitivity and 92% specificity. Both are positive. What is the posterior after both positive results?

Strategy: Apply Bayes sequentially: the posterior of Test 1 becomes the prior for Test 2.

Resolution:

Step 1 — after Test 1 positive: $P(D \mid T_1^+) = \frac{0.90 \times 0.02}{0.90 \times 0.02 + 0.05 \times 0.98} = \frac{0.018}{0.018 + 0.049} = \frac{0.018}{0.067} \approx 0.269$

Step 2 — after Test 2 positive (prior = 0.269): $P(D \mid T_1^+, T_2^+) = \frac{0.85 \times 0.269}{0.85 \times 0.269 + 0.08 \times 0.731} = \frac{0.229}{0.229 + 0.0585} = \frac{0.229}{0.287} \approx 0.798$

Verification by odds form:

Prior odds: $0.02/0.98 \approx 0.0204$ .
$\text{LR}_1^+ = 0.90/0.05 = 18$ ; $\text{LR}_2^+ = 0.85/0.08 = 10.625$ .
Posterior odds: $0.0204 \times 18 \times 10.625 = 3.898$ .
Posterior: $3.898/(1 + 3.898) \approx 79.6\%$ . Confirms the calculation above.

Source. OpenIntro Statistics 4e §3.3 — CC-BY-SA. (Extension of Example 3.13 on sequential updating with two independent tests.)

Example— 5· Beta-binomial conjugate prior (advanced)

Problem: A quality control uses prior Beta(2, 8) for the defect rate $\theta$ of a production line (equivalent to "we historically observed 2 defects in 10 inspections"). In a new batch, 20 parts are inspected and 4 are found defective. Determine: (a) the posterior, (b) the posterior mean, (c) an approximate 90% credible interval.

Strategy: Use the Beta-Binomial conjugation property. Posterior = Beta( $\alpha + k$ , $\beta + n - k$ ). For the interval, use the normal approximation of the Beta distribution for moderate parameters.

Resolution:

(a) Posterior: Prior Beta(2, 8), $n = 20$ , $k = 4$ . $\theta \mid 4 \sim \text{Beta}(2 + 4,\; 8 + 20 - 4) = \text{Beta}(6, 24)$

(b) Posterior mean: $\mu = \alpha/(\alpha + \beta) = 6/(6 + 24) = 6/30 = 0.20$ .

Compare with prior: $\mu_{\text{prior}} = 2/10 = 0.20$ . The MLE would be $k/n = 4/20 = 0.20$ . In this case they coincide because the prior was constructed with the same proportions.

(c) 90% credible interval: The Beta(6, 24) distribution has standard deviation $\sigma = \sqrt{\alpha\beta/((\alpha+\beta)^2(\alpha+\beta+1))} = \sqrt{6 \times 24/(900 \times 31)} \approx \sqrt{144/27900} \approx 0.0718$ .

Approximate interval $\mu \pm 1.645\sigma$ : $[0.20 - 0.118, 0.20 + 0.118] = [0.082, 0.318]$ . (The exact interval, by quantiles of Beta(6,24), is approximately $[0.090, 0.338]$ .)

Verification: As $n \to \infty$ , the influence of the prior decreases and the posterior concentrates on the MLE. With prior Beta(2,8) and 200 parts observed with 40 defects, the posterior would be Beta(42, 168) with mean $42/210 \approx 0.20$ and standard deviation $\approx 0.028$ — much more concentrated.

Source. OpenIntro Statistics 4e §3.4 — CC-BY-SA. (Structure of the exercise on Bayesian inference with conjugate prior, introductory Bayesian inference section.)

Exercise list

40 exercises · 10 with worked solution (25%)

Application 18Understanding 4Modeling 10Challenge 5Proof 3

Ex. 79.1ApplicationAnswer key
$P(A) = 0.3$ , $P(B) = 0.5$ , $P(A \cap B) = 0.15$ . Calculate $P(A \mid B)$ .
Solve online
Ex. 79.2Application
$P(A \mid B) = 0.6$ , $P(B) = 0.5$ . Calculate $P(A \cap B)$ .
Solve online
Ex. 79.3Application
$P(A) = 0.1$ , $P(B \mid A) = 0.8$ , $P(B \mid \bar A) = 0.2$ . Calculate $P(B)$ .
Solve online
Ex. 79.4Application
With the data from exercise 79.3, calculate $P(A \mid B)$ .
Solve online
Ex. 79.5ApplicationAnswer key
Disease with 0.5% prevalence. Diagnostic test: 95% sensitivity, 95% specificity. Calculate the PPV using frequencies in 10,000 people.
Solve online
Ex. 79.6ApplicationAnswer key
Same data as exercise 79.5, but with 50% prevalence. Calculate the PPV and compare with the previous result.
Solve online
Ex. 79.7Application
Spam filter: $P(\text{spam}) = 0.3$ . Word "FREE" appears in 60% of spams and 5% of legitimate emails. Calculate $P(\text{spam} \mid \text{FREE})$ .
Solve online
Ex. 79.8Application
Urn A: 2 red, 3 blue. Urn B: 5 red, 1 blue. An urn is chosen at random and a red ball is drawn. What is the probability the urn is A?
Solve online
Ex. 79.9ApplicationAnswer key
3 coins: 2 fair, 1 double-headed. One is chosen at random, flipped once, comes up heads. What is the probability the chosen coin is the double-headed one?
Solve online
Ex. 79.10Application
$P(\text{smoker}) = 0.2$ . $P(\text{cancer} \mid \text{smoker}) = 0.1$ . $P(\text{cancer} \mid \neg\text{smoker}) = 0.01$ . Given a person has cancer, what is the probability they are a smoker?
Solve online
Ex. 79.11Application
Sequential updating: two positive tests with 90% sensitivity and 90% specificity, applied to a disease with 1% prevalence. Use the posterior of the 1st test as the prior of the 2nd. What is the PPV after both consecutive positive results?
Solve online
Ex. 79.12Application
For a test with 90% sensitivity and 95% specificity, calculate the positive likelihood ratio $\text{LR}^+ = \text{sens}/(1 - \text{spec})$ .
Solve online
Ex. 79.13Application
Prior odds of 1:99 (1% prevalence). $\text{LR}^+ = 18$ (90% sensitivity, 95% specificity). Calculate the posterior odds and the posterior.
Solve online
Ex. 79.14Application
Which of the following values is the correct posterior in a context with prior odds 1:99 and $\text{LR}^+ = 18$ ?
Solve online
Ex. 79.15Application
Prior $\theta \sim \text{Beta}(2, 2)$ . 7 heads observed in 10 flips. Determine the posterior.
Solve online
Ex. 79.16Application
Prior $\theta \sim \text{Beta}(1, 1)$ (uniform). 0 heads observed in 5 flips. Determine the posterior and its mean.
Solve online
Ex. 79.17Application
In exercise 79.15, what is the posterior mean?
Solve online
Ex. 79.18Application
Prior $\theta \sim \text{Beta}(2, 8)$ . New batch: 30 parts inspected, 6 defective. Determine the posterior and posterior mean.
Solve online
Ex. 79.19ModelingAnswer key
COVID-19 in endemic phase: 5% prevalence. Rapid test: 80% sensitivity, 95% specificity. Calculate the PPV using frequencies in 10,000 people. Is it worth automatically isolating all positives?
Solve online
Ex. 79.20Modeling
Naive Bayes for email: $P(\text{spam}) = 0.3$ . In training: "FREE" appears in 60% of spams and 5% of hams; "won" appears in 50% of spams and 10% of hams. An email contains both words. Classify assuming conditional independence.
Solve online
Ex. 79.21Modeling
Three diseases: A (10% in population), B (5%), C (1%). Patient presents symptom S with $P(S|A) = 0.3$ , $P(S|B) = 0.9$ , $P(S|C) = 0.9$ . Which disease is most likely?
Solve online
Ex. 79.22Modeling
Prosecutor's fallacy: DNA evidence has a frequency of 1/1000 in the population. The prosecutor claims the probability of innocence is 1/1000. Why is this reasoning wrong? Calculate the correct posterior assuming there are 100,000 plausible suspects in the city.
Solve online
Ex. 79.23ModelingAnswer key
Fraud classifier: 95% sensitivity, 99.9% specificity. Frauds: 0.1% of transactions. Calculate the PPV. How many false positives for every true positive?
Solve online
Ex. 79.24Modeling
Pregnancy test: 99% sensitivity, 98% specificity. Woman with prior probability of pregnancy of 30%. Calculate the PPV.
Solve online
Ex. 79.25ModelingAnswer key
Polygraph: 70% sensitivity, 80% specificity. In interrogation with a suspect who has a 5% prior of guilt. Calculate the posterior after a positive result. Is the result admissible as sufficient evidence to convict?
Solve online
Ex. 79.26ModelingAnswer key
Two independent positive tests (sens $_1$ = 0.9, spec $_1$ = 0.95; sens $_2$ = 0.85, spec $_2$ = 0.90). Prevalence 2%. Calculate the posterior after both positive results via sequential updating.
Solve online
Ex. 79.27Modeling
In a lineup, one suspect has red hair (H) with a 70% probability of being the culprit. A witness identifies the red-haired one with 90% probability when the culprit is H, and erroneously 15% of the time when the culprit is not H. Given the witness pointed to H, what is the posterior of guilt?
Solve online
Ex. 79.28Modeling
Quality control with 3 lines (A: 40% of production, 2% defect; B: 35%, 3%; C: 25%, 5%). A defective part is found. Determine the probability of each line being the origin.
Solve online
Ex. 79.29Understanding
What is the base rate fallacy?
Solve online
Ex. 79.30Understanding
Why does the prior matter even in "objective science"? An analysis that ignores the prior is equivalent to what implicit assumption?
Solve online
Ex. 79.31Understanding
Two independent positive tests with likelihood ratios $r_1$ and $r_2$ . What is the effect on the odds form?
Solve online
Ex. 79.32Understanding
What is the practical difference between using a Beta(1,1) prior and a Beta(10,10) prior for a coin? In which case will the posterior be more sensitive to new data?
Solve online
Ex. 79.33Challenge
Show that two conditionally independent positive tests given $H$ result in posterior odds equal to $r_1 \times r_2 \times$ prior odds, where $r_i = \text{LR}_i^+$ .
Solve online
Ex. 79.34Challenge
Demonstrate that the posterior of the Bernoulli-Beta model is Beta( $\alpha + k$ , $\beta + n - k$ ) when the prior is Beta( $\alpha$ , $\beta$ ) and we observe $k$ successes in $n$ trials.
Solve online
Ex. 79.35Proof
Demonstrate Bayes' theorem from the definition of conditional probability and the law of total probability.
Solve online
Ex. 79.36Proof
Show that $P(A \mid B) = P(B \mid A)\,P(A)/P(B)$ using only the definition of conditional probability. Identify why $P(A \mid B) \neq P(B \mid A)$ in general.
Solve online
Ex. 79.37Challenge
Monty Hall problem with 3 doors. Use Bayes to calculate the probability of the car being in each door after Monty (who knows where the car is) opens an empty door. Should you switch?
Solve online
Ex. 79.38ChallengeAnswer key
In Naive Bayes with binary features, show that the classifier is equivalent to multiplying the individual LRs of each feature. What happens when the conditional independence assumption is violated?
Solve online
Ex. 79.39ProofAnswer key
Demonstrate that the odds form of Bayes, posterior odds = LR $\times$ prior odds, follows directly from the usual form of Bayes' theorem for two complementary events $H$ and $\neg H$ .
Solve online
Ex. 79.40Challenge
Show that the mean of the posterior Beta( $\alpha + k$ , $\beta + n - k$ ) converges to the maximum likelihood estimator $k/n$ when $n \to \infty$ , for any fixed prior Beta( $\alpha$ , $\beta$ ). What does this imply about the relationship between Bayes and frequentism for large samples?
Solve online

Sources

Grinstead, C.M. & Snell, J.L. — Introduction to Probability (2nd ed.) · GNU FDL · Dartmouth College. Chapter 4 (§4.1): Conditional probability, independence, Bayes' theorem — primary source for most urn, coin, and proof exercises in this lesson.
Diez, D.M., Çetinkaya-Rundel, M., Barr, C.D. — OpenIntro Statistics (4th ed.) · CC-BY-SA · OpenIntro. Sections §3.2–3.4: conditional probability, Bayes, frequency tables, and Bayesian updating — source for PPV, sequential updating, and conjugate prior exercises.
Illowsky, B. & Dean, S. — Statistics (OpenStax) · CC-BY · OpenStax. Section §3.4 (Contingency Tables and Probability Trees): medical diagnosis, spam filtering, and probability trees — basis for Naive Bayes and fraud exercises.