v1 · padrão canônico

Lesson 109 — Introductory Bayesian Statistics

Prior, likelihood, posterior. Bayes' rule. Beta-Bernoulli conjugates. MAP versus MLE. Credible interval. Introduction to inference through the Bayesian paradigm.

Used in: Stochastik LK (Alemanha, Klasse 12) · H2 Math Statistics (Singapura) · AP Statistics (EUA)

P(\theta \mid D) = \frac{P(D \mid \theta)\,P(\theta)}{P(D)}

Choose your door

Rigorous notation, full derivation, hypotheses

Rigorous definition

Bayes' theorem

Definition· Bayes' rule (general form)

Let $\theta$ be a parameter (or hypothesis) and $D$ the observed data. Definitions:

$P(\theta)$ — prior: probability distribution of the parameter before observing data.
$P(D \mid \theta)$ — likelihood: probability of the data given the parameter.
$P(D)$ — marginal evidence: $P(D) = \int P(D \mid \theta) P(\theta)\, d\theta$ (or discrete sum).
$P(\theta \mid D)$ — posterior: updated distribution after observing $D$ .

Bayes' rule:

P(\theta \mid D) = \frac{P(D \mid \theta)\, P(\theta)}{P(D)} \propto P(D \mid \theta)\, P(\theta)

what this means · The posterior is proportional to likelihood times prior. The denominator P(D) is just a normalizing constant.

"Bayes' theorem is a basic result of conditional probability, but its interpretation changes everything: it offers a formal recipe for updating beliefs in light of evidence." — OpenIntro Statistics §3.6

Conjugate priors: the Beta-Bernoulli case

Definition· Beta-Bernoulli conjugate family

When $X_1, \ldots, X_n \stackrel{\text{iid}}{\sim} \text{Bernoulli}(\theta)$ and the prior is $\theta \sim \text{Beta}(\alpha, \beta)$ , the posterior has closed form:

\theta \mid D \;\sim\; \text{Beta}(\alpha + s,\; \beta + n - s)

what this means · After s successes in n trials, the posterior is another Beta with updated parameters.

where $s = \sum_i X_i$ is the number of successes. The Beta is conjugate to the Bernoulli: prior and posterior belong to the same family.

The Beta distribution has density:

f(\theta; a, b) = \frac{\theta^{a-1}(1-\theta)^{b-1}}{B(a,b)}, \quad \theta \in [0, 1]

what this means · Density of Beta(a,b): defined on [0,1], controlled by parameters a and b.

with $B(a, b) = \Gamma(a)\Gamma(b)/\Gamma(a+b)$ .

Point estimators

Definition· MAP and MLE

Given the posterior $P(\theta \mid D)$ :

MAP (maximum a posteriori): $\hat\theta_{\text{MAP}} = \arg\max_\theta P(\theta \mid D)$ . Maximizes the posterior — includes prior information.
MLE (maximum likelihood): $\hat\theta_{\text{MLE}} = \arg\max_\theta P(D \mid \theta)$ . Maximizes the likelihood — ignores the prior.
Posterior mean (Bayes quadratic estimator): $E[\theta \mid D]$ .

Relationship: With uniform prior ( $P(\theta) = c$ ), MAP = MLE. As $n \to \infty$ , posterior concentrates at $\hat\theta_{\text{MLE}}$ — prior becomes irrelevant as data increases.

For Beta-Bernoulli:

\hat\theta_{\text{MAP}} = \frac{\alpha + s - 1}{\alpha + \beta + n - 2}, \quad E[\theta \mid D] = \frac{\alpha + s}{\alpha + \beta + n}

what this means · MAP and posterior mean for the Beta-Bernoulli case. Compare with MLE = s/n (empirical proportion).

Bayes Factor

Bayesian flow: prior × likelihood → posterior. The posterior becomes the new prior when more data arrives.

Worked examples

Example— 1· Discrete Bayes' rule — diagnosis (basic)

Problem. A disease affects 2% of the population. A test has sensitivity 90% and specificity 85%. A patient tests positive. What is the probability he has the disease?

Strategy. Apply Bayes' formula with partition $\{D, \neg D\}$ and calculate $P(D \mid +)$ .

Solution.

$P(D) = 0{,}02$ , $P(\neg D) = 0{,}98$ , $P(+ \mid D) = 0{,}90$ , $P(+ \mid \neg D) = 0{,}15$ .

Total evidence: $P(+) = P(+ \mid D)\,P(D) + P(+ \mid \neg D)\,P(\neg D)$ $= 0{,}90 \times 0{,}02 + 0{,}15 \times 0{,}98 = 0{,}018 + 0{,}147 = 0{,}165$

Posterior: $P(D \mid +) = \frac{0{,}90 \times 0{,}02}{0{,}165} = \frac{0{,}018}{0{,}165} \approx 0{,}109 = 10{,}9\%$

Verification. With only 2% prevalence, even a relatively good test generates many false positives. The answer $\approx 11\%$ makes sense: most positives come from the huge healthy population.

Source. OpenIntro Statistics §3.6, medical diagnosis example — CC-BY-SA.

Example— 2· Sequential Beta-Bernoulli update (intermediate)

Problem. An urn has unknown proportion $\theta$ of red balls. Prior: $\theta \sim \text{Beta}(2, 2)$ . Draw with replacement: 1st sample — 3 red in 5 draws; 2nd sample — 4 red in 6 draws. Calculate the posterior after each sample and the final posterior mean.

Strategy. Beta-Bernoulli: after $s$ successes in $n$ trials, $\text{Beta}(\alpha, \beta) \to \text{Beta}(\alpha+s, \beta+n-s)$ . Apply iteratively.

Solution.

Prior: $\text{Beta}(2, 2)$ , mean $= 2/4 = 0{,}50$ .

After 1st sample ( $s = 3$ , $n = 5$ ): $\text{Beta}(2+3,\; 2+5-3) = \text{Beta}(5, 4), \quad E[\theta] = \frac{5}{9} \approx 0{,}556$

After 2nd sample ( $s = 4$ , $n = 6$ ): $\text{Beta}(5+4,\; 4+6-4) = \text{Beta}(9, 6), \quad E[\theta] = \frac{9}{15} = 0{,}600$

Verification. Total data: 7 red in 11 draws, sample proportion = $7/11 \approx 0{,}636$ . Posterior mean 0.60 is between the prior (0.50) and the sample proportion — makes sense. With weak prior, the posterior converges to the MLE as $n$ grows.

Source. Think Bayes §3 — Allen Downey — CC-BY-NC-SA.

Example— 3· MAP vs MLE vs posterior mean (intermediate)

Problem. For the Beta-Bernoulli model with $\alpha = 3$ , $\beta = 3$ , after 6 successes in 10 trials, calculate the MLE, MAP, and posterior mean. Interpret the difference.

Strategy. MLE maximizes the likelihood; MAP maximizes the posterior; posterior mean is $E[\theta \mid D]$ .

Solution.

MLE: $\hat\theta_{\text{MLE}} = s/n = 6/10 = 0{,}600$ .

Posterior: $\text{Beta}(3+6, 3+4) = \text{Beta}(9, 7)$ .

MAP (mode of Beta $(a, b)$ with $a, b > 1$ is $(a-1)/(a+b-2)$ ): $\hat\theta_{\text{MAP}} = \frac{9 - 1}{9 + 7 - 2} = \frac{8}{14} \approx 0{,}571$

Posterior mean: $E[\theta \mid D] = \frac{9}{9 + 7} = \frac{9}{16} = 0{,}5625$

Verification. Ordering: mean $(0{,}5625)$ between MAP $(0{,}571)$ and the mode of Beta(9,7). MLE $(0{,}60)$ is largest — the prior "pulls" toward 0.5 (symmetric prior around 0.5). With large $n$ , all three converge to the MLE.

Source. Think Bayes §4, §6 — Allen Downey — CC-BY-NC-SA.

Example— 4· 95% Credible interval Beta (advanced)

Problem. After 12 successes in 20 trials with prior $\text{Beta}(1, 1)$ (uniform), calculate the central 95% credible interval for $\theta$ .

Strategy. Posterior $\text{Beta}(13, 9)$ . The central 95% interval is given by the 2.5% and 97.5% percentiles of the Beta distribution.

Solution.

Posterior: $\text{Beta}(1+12, 1+8) = \text{Beta}(13, 9)$ .

Posterior mean: $13/(13+9) = 13/22 \approx 0{,}591$ .

By table or software (R: qbeta(c(0.025, 0.975), 13, 9)):

2.5% percentile: $\approx 0{,}376$ . 97.5% percentile: $\approx 0{,}779$ .

95% credible interval: $(0{,}376;\; 0{,}779)$ .

Verification. Direct interpretation: "given the uniform prior and the data, the probability that $\theta$ is between 0.376 and 0.779 is 95%". Note the interval is not centered at 0.6 — it is asymmetric because the Beta is asymmetric in this case.

Source. Introduction to Probability §4.1 — Grinstead & Snell — GNU FDL.

Example— 5· Bayes factor — hypothesis comparison (proof)

Problem. To test whether $\theta = 0{,}5$ (fair coin) versus $\theta = 0{,}7$ (biased coin), with equiprobable prior ( $P(H_0) = P(H_1) = 0{,}5$ ), calculate the Bayes factor and the posterior probability of $H_1$ after 8 heads in 10 flips.

Strategy. Calculate $P(D \mid H_i)$ for each hypothesis, then apply Bayes.

Solution.

$P(D \mid H_0) = \binom{10}{8}(0{,}5)^{10} = 45 \times \frac{1}{1024} \approx 0{,}0439$

$P(D \mid H_1) = \binom{10}{8}(0{,}7)^8(0{,}3)^2 = 45 \times 0{,}05765 \times 0{,}09 \approx 0{,}2335$

Bayes factor: $BF_{10} = \frac{0{,}2335}{0{,}0439} \approx 5{,}32$

With prior $P(H_0) = P(H_1) = 0{,}5$ : $P(H_1 \mid D) = \frac{BF_{10}}{1 + BF_{10}} = \frac{5{,}32}{6{,}32} \approx 0{,}842$

Verification. $BF_{10} \approx 5{,}32$ — moderate evidence for $H_1$ (Jeffreys scale: between 3 and 10 is "moderate"). The posterior probability of a biased coin went from 50% to 84%. Consistent with the data (8 out of 10 favors $\theta = 0{,}7$ ).

Source. OpenIntro Statistics §3.7 — Diez, Çetinkaya-Rundel, Barr — CC-BY-SA.

Exercise list

34 exercises · 8 with worked solution (25%)

Application 24Understanding 3Modeling 4Challenge 1Proof 2

Ex. 109.1Application
Prevalence of a disease: 1%. Test sensitivity: 95%. False positive rate: 10%. A patient tests positive. Calculate the probability of having the disease.
Solve online
Ex. 109.2Application
A coin flipped 10 times gives 4 heads. Prior: Beta(1,1) (uniform). Calculate the posterior, posterior mean, and compare with MLE.
Solve online
Ex. 109.3Application
Prior: Beta(4, 6). Sample: 7 successes in 10. Calculate the posterior, posterior mean, and MAP.
Solve online
Ex. 109.4Application
Prior: Beta(2, 2). Batch 1: 5 successes in 10. Batch 2: 8 successes in 10. Do the sequential update and calculate the final posterior mean.
Solve online
Ex. 109.5Application
Prevalence: 0.5%. Sensitivity: 99%. False positive rate: 2%. Patient tests positive. What is the probability of having the disease?
Solve online
Ex. 109.6Application
3 successes in 10 trials. Compare the posterior mean with priors Beta(1,1) and Beta(5,5). Which prior has greater influence on the posterior?
Solve online
Ex. 109.7Application
Three factories produce bolts: E1 (60% of production, 30% defective), E2 (30%, 50% defective), E3 (10%, 10% defective). A defective bolt is drawn. What is the probability it came from E1?
Solve online
Ex. 109.8Application
Prior: Beta(3, 3) (slight belief in fair coin, mean 0.5). Flip 5 times and get 0 heads. Calculate the posterior and the new mean.
Solve online
Ex. 109.9Application
Prior: Beta(1,1). Data: 15 successes in 20. Calculate MAP and MLE. Are they equal? Why?
Solve online
Ex. 109.10Application
Bag with two coins: one always gives heads (H), the other is fair (F). One is chosen at random. Flipped twice, both heads. What is the probability it is the H coin?
Solve online
Ex. 109.11Understanding
What does a 95% Bayesian credible interval mean?
Solve online
Ex. 109.12UnderstandingAnswer key
Which statement about MAP and MLE is INCORRECT?
Solve online
Ex. 109.13Understanding
How does sample size n affect the relationship between prior and posterior?
Solve online
Ex. 109.14Application
A student passes the exam ( $A$ ). Known: $P(A \mid B_1) = 0{,}8$ (studied hard, probability 60%), $P(A \mid B_2) = 0{,}2$ (did not study, probability 40%). Given that he passed, what is the probability he studied hard?
Solve online
Ex. 109.15Application
A machine has unknown success rate. Prior: Beta(4, 2) (history of 4 successes and 2 failures). New test: 6 consecutive successes. Calculate the posterior, mean, and MAP.
Solve online
Ex. 109.16Application
Calculate the Bayes Factor for $H_1: \theta = 0{,}7$ versus $H_0: \theta = 0{,}5$ after 8 heads in 10 flips.
Solve online
Ex. 109.17ApplicationAnswer key
Three batches of 10 trials each: 7 successes, 6 successes, 7 successes. Prior: Beta(1,1). Do the sequential update and calculate the final posterior mean.
Solve online
Ex. 109.18Application
Prevalence: 30%. Sensitivity: 95%. False positive rate: 20%. Patient tests positive. Calculate the probability of having the disease and compare with exercise 109.1.
Solve online
Ex. 109.19ApplicationAnswer key
Show that the posterior mean of the Beta-Bernoulli model is a weighted average between the prior and the sample proportion. Identify the weights.
Solve online
Ex. 109.20Application
Prior: Beta(2, 2). Data: 0 successes in 3. Calculate the posterior, MAP, and posterior mean.
Solve online
Ex. 109.21Application
Probability of rain in Fortaleza on a given day: 40%. If it rains, there is an 85% chance of dark clouds. If it does not rain, 30%. There are dark clouds. What is the probability it will rain?
Solve online
Ex. 109.22ApplicationAnswer key
Production history: 10% defects (equivalent to 10 defects in 100 parts = Beta(10,90)). New inspection: 3 defects in 20. Calculate the posterior and posterior mean.
Solve online
Ex. 109.23Application
Bag with 3 coins: 1 always gives heads (H), 2 are fair (F). One coin is drawn randomly and flipped: heads appears. What is the probability it is the H coin?
Solve online
Ex. 109.24Application
Prior Beta(1,1). Data: 10 successes in 20. Describe the posterior and the central 95% credible interval (use the fact that the 2.5% percentile of Beta(11,11) ≈ 0.31).
Solve online
Ex. 109.25Modeling
A test prep course historically approves 70% of students on the ENEM. New cohort, 20 students: 15 passed. Propose a suitable Beta prior, calculate the posterior, and the posterior mean of the approval rate.
Solve online
Ex. 109.26ModelingAnswer key
Prevalence of pancreatic cancer: 0.2%. Biopsy: sensitivity 92%, specificity 97%. Test positive. Calculate P(cancer | positive) and discuss the medical decision.
Solve online
Ex. 109.27Modeling
A shipping company reports 20 delayed deliveries in 50 monitored deliveries. Using prior Beta(1,1), estimate the delay rate with a 90% credible interval.
Solve online
Ex. 109.28ModelingAnswer key
A fintech knows that 1% of transactions are fraudulent. An algorithm detects that the current transaction has a value outside the customer's normal pattern. P(abnormal value | fraud) = 85%, P(abnormal value | legitimate) = 2%. Calculate the probability of fraud.
Solve online
Ex. 109.29Proof
Show that, for the Bernoulli model with Beta prior, the posterior is also Beta. Identify the parameters.
Solve online
Ex. 109.30ProofAnswer key
Prove that, with uniform prior Beta(1,1), the MAP estimator coincides with the MLE for the Bernoulli model.
Ex. 109.31ApplicationAnswer key
Spam filter: 20% of emails are spam. In spam emails, each suspicious keyword appears with probability 60%; in legitimate emails, 5%. An email has 3 keywords. What is the probability it is spam?
Solve online
Ex. 109.32Application
Two groups of rats: lineage 1 (10 animals, 8 developed tumor after exposure) and lineage 2 (10 animals, 3 developed). Prior Beta(1,1) for both rates. Calculate the posterior and posterior mean for each lineage.
Solve online
Ex. 109.33Application
An urn has unknown proportion of orange balls. After 100 draws with replacement, 50 are orange. Prior Beta(1,1). Calculate the posterior, the mean, and the 95% credible interval.
Solve online
Ex. 109.34Challenge
The Jeffreys prior for Bernoulli is Beta(0.5; 0.5). After 6 successes in 10, calculate the posterior. Research what it means for this prior to be "invariant under parametrization" and compare the posterior mean with the Beta(1,1) prior.
Solve online

Sources

Think Bayes — Allen B. Downey · CC-BY-NC-SA · Greenteapress · Chapters 1–9.
Introduction to Probability — Grinstead & Snell · GNU FDL · Dartmouth · §4.1.
OpenIntro Statistics — Diez, Çetinkaya-Rundel, Barr · CC-BY-SA · OpenIntro · §3.6–3.7.