When to Use Chi-Square
Chi-square (χ²) tests work with categorical data (counts, frequencies) — unlike t-tests which require quantitative data.
Two main uses:
- Goodness-of-fit test: Does this sample match an expected distribution?
- Test of independence: Are two categorical variables related in the population?
The Chi-Square Statistic
The chi-square statistic measures how far observed counts are from expected counts:
χ² = Σ [(O − E)² / E]
Where:
O = Observed count in each category
E = Expected count under H₀
Σ = sum over all categories/cells
Properties:
- Always ≥ 0
- Larger χ² = more difference from H₀ = more evidence against H₀
- Follows a χ² distribution with degrees of freedom depending on the test
1. Chi-Square Goodness-of-Fit Test
Question: Does the observed distribution of one categorical variable match a hypothesised distribution?
Assumptions
- Random sample
- Expected frequency ≥ 5 in each category (if not, combine categories)
- Observations are independent
Worked Example
Scenario: A die is rolled 120 times. Do the outcomes match a fair die?
Observed:
Face O (observed) E (expected = 120/6)
1 24 20
2 18 20
3 22 20
4 26 20
5 16 20
6 14 20
────── ──────
Total 120 120
H₀: The die is fair (each face equally likely, p=1/6)
H₁: The die is not fair
α = 0.05
Step 1: Compute χ²
χ² = (24−20)²/20 + (18−20)²/20 + (22−20)²/20 + (26−20)²/20 + (16−20)²/20 + (14−20)²/20
= 16/20 + 4/20 + 4/20 + 36/20 + 16/20 + 36/20
= 0.8 + 0.2 + 0.2 + 1.8 + 0.8 + 1.8
= 5.6
Step 2: Degrees of freedom
df = k − 1 = 6 − 1 = 5
Step 3: Critical value (or p-value)
From χ² table: χ²(5, α=0.05) = 11.07
Our statistic: χ² = 5.6 < 11.07 → FAIL TO REJECT H₀
p-value ≈ 0.35
Conclusion: No significant evidence that the die is unfair (χ²(5)=5.6, p=0.35).
The observed frequencies are consistent with a fair die.
Finding Expected Frequencies from a Hypothesised Distribution
H₀ doesn't have to be "uniform." It can be any specified distribution.
Example: Company claims 40% buy Product A, 35% buy B, 25% buy C.
Survey of 200 customers: 90 bought A, 65 bought B, 45 bought C.
Expected counts:
A: 200 × 0.40 = 80
B: 200 × 0.35 = 70
C: 200 × 0.25 = 50
χ² = (90−80)²/80 + (65−70)²/70 + (45−50)²/50
= 100/80 + 25/70 + 25/50
= 1.25 + 0.357 + 0.50
= 2.107
df = 3 − 1 = 2
χ²(2, 0.05) = 5.991
2.107 < 5.991 → fail to reject H₀
Survey data is consistent with the company's claimed distribution.
2. Chi-Square Test of Independence
Question: Are two categorical variables statistically independent (unrelated)?
H₀: The two variables are independent
H₁: The two variables are NOT independent (there is an association)
Contingency Table
The data is arranged in a two-way table (rows = one variable, columns = another):
Example: Is there a relationship between Department and Job Level?
Junior Mid-Level Senior Total
Finance 20 35 15 70
Technology 15 40 25 80
Marketing 25 20 5 50
Total 60 95 45 200
Expected Frequencies
Under H₀ (independence), the expected frequency for each cell:
E_ij = (Row total_i × Column total_j) / Grand total
E(Finance, Junior) = 70 × 60 / 200 = 21
E(Finance, Mid) = 70 × 95 / 200 = 33.25
E(Finance, Senior) = 70 × 45 / 200 = 15.75
E(Tech, Junior) = 80 × 60 / 200 = 24
E(Tech, Mid) = 80 × 95 / 200 = 38
E(Tech, Senior) = 80 × 45 / 200 = 18
E(Mktg, Junior) = 50 × 60 / 200 = 15
E(Mktg, Mid) = 50 × 95 / 200 = 23.75
E(Mktg, Senior) = 50 × 45 / 200 = 11.25
Computing χ²
χ² = Σ [(O − E)² / E]
Cell contributions:
Finance, Junior: (20−21)²/21 = 1/21 = 0.048
Finance, Mid: (35−33.25)²/33.25 = 3.0625/33.25 = 0.092
Finance, Senior: (15−15.75)²/15.75 = 0.5625/15.75 = 0.036
Tech, Junior: (15−24)²/24 = 81/24 = 3.375
Tech, Mid: (40−38)²/38 = 4/38 = 0.105
Tech, Senior: (25−18)²/18 = 49/18 = 2.722
Mktg, Junior: (25−15)²/15 = 100/15 = 6.667
Mktg, Mid: (20−23.75)²/23.75 = 14.0625/23.75 = 0.592
Mktg, Senior: (5−11.25)²/11.25 = 39.0625/11.25 = 3.472
χ² = 0.048 + 0.092 + 0.036 + 3.375 + 0.105 + 2.722 + 6.667 + 0.592 + 3.472 = 17.11
Degrees of freedom = (rows − 1) × (columns − 1) = (3−1) × (3−1) = 2 × 2 = 4
χ²(4, α=0.05) = 9.488
χ² = 17.11 > 9.488 → REJECT H₀
p-value < 0.002
Conclusion: There is a significant association between Department and Job Level.
The distribution of seniority differs across departments.
Interpreting the Result
The test tells you there IS a relationship, but not what it is. Look at the data:
Standardised residuals = (O − E) / √E
Tech, Junior: (15−24)/√24 = −9/4.9 = −1.84 (fewer junior than expected)
Tech, Senior: (25−18)/√18 = +7/4.24 = +1.65 (more senior than expected)
Mktg, Junior: (25−15)/√15 = +10/3.87 = +2.58 ← large positive (more junior than expected)
Mktg, Senior: (5−11.25)/√11.25 = −6.25/3.35 = −1.87 (fewer senior than expected)
→ Technology has more senior employees than expected
→ Marketing has more junior employees than expected
Practical Examples
Example 1: Website Traffic Source Analysis
Expected (from last year's data): Organic=50%, Paid=30%, Referral=15%, Direct=5%
This year's sample (n=500): Organic=230, Paid=160, Referral=85, Direct=25
E: Organic=250, Paid=150, Referral=75, Direct=25
χ² = (230−250)²/250 + (160−150)²/150 + (85−75)²/75 + (25−25)²/25
= 400/250 + 100/150 + 100/75 + 0/25
= 1.6 + 0.667 + 1.333 + 0
= 3.6
df = 4−1 = 3
χ²(3, 0.05) = 7.815
3.6 < 7.815 → fail to reject H₀
Traffic source distribution is not significantly different from last year.
Example 2: Drug Side Effects by Gender
Question: Are side effects associated with gender?
Side Effects No Side Effects Total
Male 45 155 200
Female 35 165 200
Total 80 320 400
Expected:
Male with SE: 200×80/400 = 40
Male without: 200×320/400 = 160
Female with SE: 200×80/400 = 40
Female without: 200×320/400 = 160
χ² = (45−40)²/40 + (155−160)²/160 + (35−40)²/40 + (165−160)²/160
= 25/40 + 25/160 + 25/40 + 25/160
= 0.625 + 0.156 + 0.625 + 0.156
= 1.562
df = (2−1)(2−1) = 1
χ²(1, 0.05) = 3.841
1.562 < 3.841 → fail to reject H₀
No significant association between gender and side effects.
Example 3: A/B Test Conversion (2×2 Table)
Question: Is conversion rate different between Version A and Version B?
Converted Not Converted Total
Version A 120 880 1000
Version B 148 852 1000
Total 268 1732 2000
Expected:
A, Converted: 1000×268/2000 = 134
A, Not: 1000×1732/2000 = 866
B, Converted: 134
B, Not: 866
χ² = (120−134)²/134 + (880−866)²/866 + (148−134)²/134 + (852−866)²/866
= 196/134 + 196/866 + 196/134 + 196/866
= 1.463 + 0.226 + 1.463 + 0.226
= 3.378
df = 1
χ²(1, 0.05) = 3.841
3.378 < 3.841 → fail to reject H₀ (p ≈ 0.066)
Conversion rates are not significantly different (p=0.066). Need more data.
For 2×2 Tables: Yates' Correction
When df=1, apply Yates' continuity correction for better approximation:
χ² = Σ [(|O − E| − 0.5)² / E]
Measures of Association
After rejecting independence, measure the strength of association:
Phi coefficient (2×2 tables):
φ = √(χ²/n) → ranges from 0 to 1
Cramér's V (larger tables):
V = √(χ²/(n × min(r−1, c−1))) → ranges from 0 to 1
Interpretation:
V ≈ 0.1 → weak association
V ≈ 0.3 → moderate association
V ≈ 0.5 → strong association
For our Department-Level example:
V = √(17.11/(200 × min(2,2))) = √(17.11/400) = √0.04278 = 0.207 → moderate association
Common Mistakes
1. Expected frequency < 5
If any E < 5, the χ² approximation breaks down.
Fix: Combine small categories, or use Fisher's Exact Test (2×2 tables)
2. Using χ² for quantitative data
χ² works on COUNTS, not means or continuous values.
For comparing means: use t-test or ANOVA.
3. Confusing goodness-of-fit with independence
Goodness-of-fit: ONE variable, comparing to a known distribution (df = k−1)
Independence: TWO variables in a contingency table (df = (r−1)(c−1))
4. Ignoring effect size after significance
With large samples, even tiny associations become significant.
Always compute Cramér's V to assess practical significance.
5. Direction of association from the p-value
χ² only tells you IF there's an association — not WHICH direction.
Look at observed vs expected, or standardised residuals, to understand the nature of the association.
Practice Exercises
-
Roll a die 180 times: Observed: 1→28, 2→32, 3→25, 4→35, 5→20, 6→40. Test if the die is fair (α=0.05).
-
200 customers are surveyed about brand preference: Brand A=65, B=80, C=55. The company claims equal preference (33.3% each). Test this claim (α=0.05).
-
Survey of 300 people: Is preferred news source (TV/Online/Print) associated with age group (Young/Middle/Senior)? Set up the contingency table, compute expected values, and test independence.
-
For a 2×2 contingency table with χ²=4.5 and n=100, compute Cramér's V. Is this a strong, moderate, or weak association?
-
A quality inspector finds χ² = 2.3 with df=3 (p=0.51). A colleague says "the data is perfect." What's wrong with this interpretation?
Summary
In this chapter you learned:
- Chi-square statistic: χ² = Σ[(O−E)²/E] — measures how far observed counts are from expected; always ≥ 0
- Goodness-of-fit test: one variable; df = k−1; compares observed to any hypothesised distribution
- Test of independence: two-way contingency table; df = (r−1)(c−1); E_ij = (row_i × col_j)/n
- Decision rule: χ² > χ²_critical (or p < α) → reject H₀ (of fit / of independence)
- Assumption: all expected frequencies ≥ 5; if not, combine cells or use Fisher's Exact Test
- Cramér's V: measure of association strength after significant χ²; V = √(χ²/(n×min(r−1,c−1)))
- Standardised residuals: (O−E)/√E → identify which cells drive the association
- χ² reveals IF an association exists; look at residuals to understand WHAT the association is
- For quantitative outcomes, use t-tests/ANOVA; for counts/categories, use χ²
Next up: ANOVA — Analysis of Variance for comparing means across three or more groups simultaneously.