What Is Conditional Probability?
Conditional probability asks: Given that we know something has happened, how does that change the probability of another event?
Without information: P(it rains today) = 0.3
Given dark clouds: P(it rains | dark clouds) = 0.7
The new information (dark clouds) updates our probability estimate.
The Conditional Probability Formula
P(A | B) = P(A and B) / P(B)
Read: "Probability of A given B"
Valid only when P(B) > 0
Intuition
When we're told B has happened, we restrict our sample space to B. Within that restricted space, we find how much of it includes A.
Roll a die. S = {1, 2, 3, 4, 5, 6}
Event A = {2, 4, 6} (even)
Event B = {4, 5, 6} (greater than 3)
P(A) = 3/6 = 0.5 (unconditional)
P(A | B) = P(A and B) / P(B)
A and B = {4, 6} → P(A and B) = 2/6
P(B) = 3/6
P(A | B) = (2/6) / (3/6) = 2/3 ≈ 0.667
Interpretation: Given the die shows > 3, the probability it's even = 2/3
(Because within {4,5,6}, two of the three outcomes are even)
Non-Symmetry of Conditioning
P(A | B) ≠ P(B | A) in general
Classic error (Prosecutor's Fallacy):
P(DNA matches | innocent) = 0.0001 (very unlikely)
P(innocent | DNA matches) ≠ 0.0001 (depends on base rate of guilt!)
The Multiplication Rule (General Form)
P(A and B) = P(A) × P(B | A)
= P(B) × P(A | B)
Two cards drawn from a deck (without replacement):
P(first Ace AND second Ace) = P(first Ace) × P(second Ace | first Ace)
= 4/52 × 3/51
= 12/2652 = 1/221
Statistical Independence
Two events A and B are independent if knowing B happened tells us nothing about whether A happened:
Formal definition:
P(A | B) = P(A) [knowing B doesn't change P(A)]
Equivalent to:
P(A and B) = P(A) × P(B)
Testing Independence
Two coin flips: A = {H on flip 1}, B = {H on flip 2}
P(A) = 0.5
P(A | B) = 0.5 ← same! → Independent
Drawing cards without replacement:
A = {first card is Ace}, B = {second card is Ace}
P(A) = 4/52 ≈ 0.077
P(A | B) = 3/51 ≈ 0.059 ← different! → Dependent (as expected — we removed one Ace)
Independence vs Mutual Exclusivity
These are often confused:
Mutually exclusive: P(A and B) = 0 — they CAN'T both happen
Independent: P(A and B) = P(A)×P(B) — one doesn't affect the other
Two mutually exclusive events (with P(A)>0 and P(B)>0) are NEVER independent:
P(A | B) = P(A and B)/P(B) = 0/P(B) = 0 ≠ P(A)
→ Knowing B happened makes A impossible → definitely not independent!
The Law of Total Probability
If B₁, B₂, ..., Bₙ partition the sample space (mutually exclusive and exhaustive):
P(A) = P(A|B₁)P(B₁) + P(A|B₂)P(B₂) + ... + P(A|Bₙ)P(Bₙ)
= Σ P(A|Bᵢ) × P(Bᵢ)
Example: Loan Approval
Loan applicants come from three credit rating categories:
Category P(applicant in category) P(approved | category)
Good 0.60 0.90
Fair 0.30 0.60
Poor 0.10 0.20
P(approved) = P(A|Good)P(Good) + P(A|Fair)P(Fair) + P(A|Poor)P(Poor)
= 0.90×0.60 + 0.60×0.30 + 0.20×0.10
= 0.540 + 0.180 + 0.020
= 0.740 (74% approval rate overall)
Bayes' Theorem
Bayes' theorem updates a prior probability (our belief before new evidence) to a posterior probability (our belief after seeing evidence).
P(B | A) = P(A | B) × P(B) / P(A)
Using the Law of Total Probability for P(A):
P(B | A) = P(A | B) × P(B) / [P(A|B)P(B) + P(A|Bᶜ)P(Bᶜ)]
The Vocabulary
- Prior probability P(B): probability before seeing evidence
- Likelihood P(A|B): probability of seeing the evidence if B is true
- Posterior probability P(B|A): probability after seeing evidence
- Marginal probability P(A): overall probability of seeing the evidence (law of total probability)
Bayes' Theorem in Practice
Example 1: Medical Testing
Disease D affects 1% of the population: P(D) = 0.01, P(no D) = 0.99
Test characteristics:
- Sensitivity: P(positive | disease) = 0.95
- False positive rate: P(positive | no disease) = 0.05
You test positive. What is P(disease | positive)?
Using Bayes:
P(D | positive) = P(positive | D) × P(D) / P(positive)
First, find P(positive) using law of total probability:
P(positive) = P(positive|D)×P(D) + P(positive|no D)×P(no D)
= 0.95 × 0.01 + 0.05 × 0.99
= 0.0095 + 0.0495
= 0.059
P(D | positive) = (0.95 × 0.01) / 0.059
= 0.0095 / 0.059
= 0.161 ≈ 16%
Surprising result: Even with a 95% sensitive test, a positive result means only a 16% chance of actually having the disease — because the disease is rare (1% prevalence).
This is why mass screening for rare diseases generates many false positives.
Example 2: Spam Filter
Prior: P(spam) = 0.30 (30% of emails are spam)
Evidence: email contains the word "offer"
P("offer" | spam) = 0.70
P("offer" | not spam) = 0.15
P(spam | "offer") = P("offer"|spam) × P(spam) / P("offer")
P("offer") = 0.70×0.30 + 0.15×0.70 = 0.21 + 0.105 = 0.315
P(spam | "offer") = (0.70 × 0.30) / 0.315 = 0.21/0.315 = 0.667
→ If an email contains "offer", there's a 66.7% probability it's spam
(updated from the prior of 30%)
Example 3: Quality Control / Supplier Audit
Three suppliers provide components:
Supplier A: 50% of supply, 2% defect rate
Supplier B: 30% of supply, 5% defect rate
Supplier C: 20% of supply, 10% defect rate
A defective item is found. Which supplier most likely made it?
P(defective) = 0.02×0.50 + 0.05×0.30 + 0.10×0.20
= 0.010 + 0.015 + 0.020 = 0.045
P(A | defective) = (0.02 × 0.50) / 0.045 = 0.010/0.045 = 22.2%
P(B | defective) = (0.05 × 0.30) / 0.045 = 0.015/0.045 = 33.3%
P(C | defective) = (0.10 × 0.20) / 0.045 = 0.020/0.045 = 44.4%
→ Despite supplying only 20%, Supplier C is the most likely source (44.4%)
→ Action: focus quality audit on Supplier C
Using a Contingency Table for Bayes
Often easier to compute with frequencies rather than probabilities:
Medical test example with n = 100,000 people:
Population: 100,000
Disease prevalence: 1% → 1,000 have disease, 99,000 don't
| Disease | No Disease | Total
Test Positive | 950 | 4,950 | 5,900
Test Negative | 50 | 94,050 | 94,100
Total | 1,000 | 99,000 | 100,000
Calculations:
True Positives: 1,000 × 0.95 = 950
False Positives: 99,000 × 0.05 = 4,950
Total positives: 950 + 4,950 = 5,900
P(disease | positive) = 950 / 5,900 = 16.1% ← same answer as before!
The table makes it visually clear why 16% — out of 5,900 positive tests, only 950 are truly positive.
Sequential Updating
Bayes' theorem is cumulative — you can update with new evidence multiple times.
Start with prior P(D) = 0.30
Test 1 positive → posterior P(D | T₁+) = 0.75 (becomes new prior)
Test 2 positive → posterior P(D | T₁+, T₂+) = ?
→ Use 0.75 as the new prior, apply Bayes again
Each test result updates our belief incrementally.
This is the foundation of Bayesian statistics.
Common Mistakes
1. Ignoring base rates (Base Rate Neglect)
"The test is 95% accurate → if positive, I'm 95% likely to have the disease."
WRONG — the 16% calculation above shows this is false for rare diseases.
Always consider the prior probability (prevalence).
2. The Prosecutor's Fallacy
Forensics: P(DNA match | innocent) = 1 in 1,000,000
Prosecutor: "There's only a 1 in 1,000,000 chance the suspect is innocent."
WRONG: P(innocent | DNA match) depends on how many people were in the database
and the prior probability of guilt.
3. Confusing P(A|B) and P(B|A)
P(symptoms | flu) ≠ P(flu | symptoms)
High P(symptom | flu) doesn't mean high P(flu | symptom) if flu is rare.
4. Multiplying dependent probabilities as if independent
P(two defective items from the same supplier)
Without replacement: dependence — use conditional probability, not P×P.
Practice Exercises
-
In a city, 40% of drivers are women. Women have accidents 30% of the time; men have accidents 50% of the time. A driver is in an accident. What is the probability the driver is a woman?
-
A factory produces 60% of products on Machine A and 40% on Machine B. Machine A produces 3% defectives; Machine B produces 7%. A random product is found defective. Which machine likely made it? (Find P(A|defective) and P(B|defective))
-
Are events A and B independent if P(A) = 0.4, P(B) = 0.3, P(A and B) = 0.12? Verify.
-
A disease affects 2% of the population. A test has 90% sensitivity and 8% false positive rate. Using a contingency table with n = 100,000, find P(disease | positive test).
-
A credit analyst believes there's a 25% chance a borrower will default (prior). An early payment miss (new evidence) has P(miss | will default) = 0.80 and P(miss | will not default) = 0.10. After the missed payment, what is the updated probability of default?
Summary
In this chapter you learned:
- Conditional probability: P(A|B) = P(A and B) / P(B) — probability of A given B occurred; restricts sample space to B
- Multiplication rule (general): P(A and B) = P(A) × P(B|A)
- Independence: P(A|B) = P(A) — equivalent to P(A and B) = P(A)×P(B)
- Independence ≠ mutual exclusivity — mutually exclusive events with positive probability are always dependent
- Law of total probability: P(A) = ΣP(A|Bᵢ)P(Bᵢ) over a partition of S
- Bayes' theorem: P(B|A) = P(A|B)×P(B) / P(A) — updates prior to posterior with evidence
- Prior = belief before evidence; Posterior = belief after evidence
- Base rate neglect: low-prevalence conditions → low PPV even with a sensitive test
- Contingency table method often clearer than formulas for Bayes calculations
- Sequential updating: apply Bayes repeatedly as new evidence arrives
Next up: Probability Distributions — the Binomial and Poisson distributions for counting discrete outcomes.