Conditional Probability & Bayes' Theorem

What Is Conditional Probability?

Conditional probability asks: Given that we know something has happened, how does that change the probability of another event?

Without information: P(it rains today) = 0.3
Given dark clouds: P(it rains | dark clouds) = 0.7

The new information (dark clouds) updates our probability estimate.

The Conditional Probability Formula

P(A | B) = P(A and B) / P(B)

Read: "Probability of A given B"
Valid only when P(B) > 0

Intuition

When we're told B has happened, we restrict our sample space to B. Within that restricted space, we find how much of it includes A.

Roll a die. S = {1, 2, 3, 4, 5, 6}

Event A = {2, 4, 6} (even)
Event B = {4, 5, 6} (greater than 3)

P(A) = 3/6 = 0.5 (unconditional)
P(A | B) = P(A and B) / P(B)

A and B = {4, 6} → P(A and B) = 2/6
P(B) = 3/6

P(A | B) = (2/6) / (3/6) = 2/3 ≈ 0.667

Interpretation: Given the die shows > 3, the probability it's even = 2/3
(Because within {4,5,6}, two of the three outcomes are even)

Non-Symmetry of Conditioning

P(A | B) ≠ P(B | A) in general

Classic error (Prosecutor's Fallacy):
P(DNA matches | innocent) = 0.0001 (very unlikely)
P(innocent | DNA matches) ≠ 0.0001 (depends on base rate of guilt!)

The Multiplication Rule (General Form)

P(A and B) = P(A) × P(B | A)
           = P(B) × P(A | B)

Two cards drawn from a deck (without replacement):
P(first Ace AND second Ace) = P(first Ace) × P(second Ace | first Ace)
                             = 4/52 × 3/51
                             = 12/2652 = 1/221

Statistical Independence

Two events A and B are independent if knowing B happened tells us nothing about whether A happened:

Formal definition:
P(A | B) = P(A)     [knowing B doesn't change P(A)]

Equivalent to:
P(A and B) = P(A) × P(B)

Testing Independence

Two coin flips: A = {H on flip 1}, B = {H on flip 2}
P(A) = 0.5
P(A | B) = 0.5 ← same! → Independent

Drawing cards without replacement:
A = {first card is Ace}, B = {second card is Ace}
P(A) = 4/52 ≈ 0.077
P(A | B) = 3/51 ≈ 0.059 ← different! → Dependent (as expected — we removed one Ace)

Independence vs Mutual Exclusivity

These are often confused:

Mutually exclusive: P(A and B) = 0 — they CAN'T both happen
Independent: P(A and B) = P(A)×P(B) — one doesn't affect the other

Two mutually exclusive events (with P(A)>0 and P(B)>0) are NEVER independent:
P(A | B) = P(A and B)/P(B) = 0/P(B) = 0 ≠ P(A)
→ Knowing B happened makes A impossible → definitely not independent!

The Law of Total Probability

If B₁, B₂, ..., Bₙ partition the sample space (mutually exclusive and exhaustive):

P(A) = P(A|B₁)P(B₁) + P(A|B₂)P(B₂) + ... + P(A|Bₙ)P(Bₙ)
     = Σ P(A|Bᵢ) × P(Bᵢ)

Example: Loan Approval

Loan applicants come from three credit rating categories:

Category   P(applicant in category)   P(approved | category)
Good              0.60                        0.90
Fair              0.30                        0.60
Poor              0.10                        0.20

P(approved) = P(A|Good)P(Good) + P(A|Fair)P(Fair) + P(A|Poor)P(Poor)
            = 0.90×0.60 + 0.60×0.30 + 0.20×0.10
            = 0.540 + 0.180 + 0.020
            = 0.740 (74% approval rate overall)

Bayes' Theorem

Bayes' theorem updates a prior probability (our belief before new evidence) to a posterior probability (our belief after seeing evidence).

P(B | A) = P(A | B) × P(B) / P(A)

Using the Law of Total Probability for P(A):
P(B | A) = P(A | B) × P(B) / [P(A|B)P(B) + P(A|Bᶜ)P(Bᶜ)]

The Vocabulary

Prior probability P(B): probability before seeing evidence
Likelihood P(A|B): probability of seeing the evidence if B is true
Posterior probability P(B|A): probability after seeing evidence
Marginal probability P(A): overall probability of seeing the evidence (law of total probability)

Bayes' Theorem in Practice

Example 1: Medical Testing

Disease D affects 1% of the population: P(D) = 0.01, P(no D) = 0.99

Test characteristics:
- Sensitivity: P(positive | disease) = 0.95
- False positive rate: P(positive | no disease) = 0.05

You test positive. What is P(disease | positive)?

Using Bayes:
P(D | positive) = P(positive | D) × P(D) / P(positive)

First, find P(positive) using law of total probability:
P(positive) = P(positive|D)×P(D) + P(positive|no D)×P(no D)
            = 0.95 × 0.01 + 0.05 × 0.99
            = 0.0095 + 0.0495
            = 0.059

P(D | positive) = (0.95 × 0.01) / 0.059
               = 0.0095 / 0.059
               = 0.161 ≈ 16%

Surprising result: Even with a 95% sensitive test, a positive result means only a 16% chance of actually having the disease — because the disease is rare (1% prevalence).

This is why mass screening for rare diseases generates many false positives.

Example 2: Spam Filter

Prior: P(spam) = 0.30 (30% of emails are spam)

Evidence: email contains the word "offer"
P("offer" | spam) = 0.70
P("offer" | not spam) = 0.15

P(spam | "offer") = P("offer"|spam) × P(spam) / P("offer")

P("offer") = 0.70×0.30 + 0.15×0.70 = 0.21 + 0.105 = 0.315

P(spam | "offer") = (0.70 × 0.30) / 0.315 = 0.21/0.315 = 0.667

→ If an email contains "offer", there's a 66.7% probability it's spam
(updated from the prior of 30%)

Example 3: Quality Control / Supplier Audit

Three suppliers provide components:
Supplier A: 50% of supply, 2% defect rate
Supplier B: 30% of supply, 5% defect rate
Supplier C: 20% of supply, 10% defect rate

A defective item is found. Which supplier most likely made it?

P(defective) = 0.02×0.50 + 0.05×0.30 + 0.10×0.20
             = 0.010 + 0.015 + 0.020 = 0.045

P(A | defective) = (0.02 × 0.50) / 0.045 = 0.010/0.045 = 22.2%
P(B | defective) = (0.05 × 0.30) / 0.045 = 0.015/0.045 = 33.3%
P(C | defective) = (0.10 × 0.20) / 0.045 = 0.020/0.045 = 44.4%

→ Despite supplying only 20%, Supplier C is the most likely source (44.4%)
→ Action: focus quality audit on Supplier C

Using a Contingency Table for Bayes

Often easier to compute with frequencies rather than probabilities:

Medical test example with n = 100,000 people:

Population: 100,000
Disease prevalence: 1% → 1,000 have disease, 99,000 don't

                  | Disease | No Disease | Total
Test Positive     |   950   |   4,950    | 5,900
Test Negative     |    50   |  94,050    | 94,100
Total             | 1,000   |  99,000    | 100,000

Calculations:
True Positives: 1,000 × 0.95 = 950
False Positives: 99,000 × 0.05 = 4,950
Total positives: 950 + 4,950 = 5,900

P(disease | positive) = 950 / 5,900 = 16.1%  ← same answer as before!

The table makes it visually clear why 16% — out of 5,900 positive tests, only 950 are truly positive.

Sequential Updating

Bayes' theorem is cumulative — you can update with new evidence multiple times.

Start with prior P(D) = 0.30

Test 1 positive → posterior P(D | T₁+) = 0.75 (becomes new prior)
Test 2 positive → posterior P(D | T₁+, T₂+) = ?
  → Use 0.75 as the new prior, apply Bayes again

Each test result updates our belief incrementally.
This is the foundation of Bayesian statistics.

Common Mistakes

1. Ignoring base rates (Base Rate Neglect)

"The test is 95% accurate → if positive, I'm 95% likely to have the disease."
WRONG — the 16% calculation above shows this is false for rare diseases.
Always consider the prior probability (prevalence).

2. The Prosecutor's Fallacy

Forensics: P(DNA match | innocent) = 1 in 1,000,000
Prosecutor: "There's only a 1 in 1,000,000 chance the suspect is innocent."
WRONG: P(innocent | DNA match) depends on how many people were in the database
       and the prior probability of guilt.

3. Confusing P(A|B) and P(B|A)

P(symptoms | flu) ≠ P(flu | symptoms)

High P(symptom | flu) doesn't mean high P(flu | symptom) if flu is rare.

4. Multiplying dependent probabilities as if independent

P(two defective items from the same supplier)
Without replacement: dependence — use conditional probability, not P×P.

Practice Exercises

In a city, 40% of drivers are women. Women have accidents 30% of the time; men have accidents 50% of the time. A driver is in an accident. What is the probability the driver is a woman?
A factory produces 60% of products on Machine A and 40% on Machine B. Machine A produces 3% defectives; Machine B produces 7%. A random product is found defective. Which machine likely made it? (Find P(A|defective) and P(B|defective))
Are events A and B independent if P(A) = 0.4, P(B) = 0.3, P(A and B) = 0.12? Verify.
A disease affects 2% of the population. A test has 90% sensitivity and 8% false positive rate. Using a contingency table with n = 100,000, find P(disease | positive test).
A credit analyst believes there's a 25% chance a borrower will default (prior). An early payment miss (new evidence) has P(miss | will default) = 0.80 and P(miss | will not default) = 0.10. After the missed payment, what is the updated probability of default?

Summary

In this chapter you learned:

Conditional probability: P(A|B) = P(A and B) / P(B) — probability of A given B occurred; restricts sample space to B
Multiplication rule (general): P(A and B) = P(A) × P(B|A)
Independence: P(A|B) = P(A) — equivalent to P(A and B) = P(A)×P(B)
Independence ≠ mutual exclusivity — mutually exclusive events with positive probability are always dependent
Law of total probability: P(A) = ΣP(A|Bᵢ)P(Bᵢ) over a partition of S
Bayes' theorem: P(B|A) = P(A|B)×P(B) / P(A) — updates prior to posterior with evidence
Prior = belief before evidence; Posterior = belief after evidence
Base rate neglect: low-prevalence conditions → low PPV even with a sensitive test
Contingency table method often clearer than formulas for Bayes calculations
Sequential updating: apply Bayes repeatedly as new evidence arrives

Next up: Probability Distributions — the Binomial and Poisson distributions for counting discrete outcomes.