Chapter 4 of 18

Descriptive Statistics — Mean, Median & Mode

Summarise the centre of a distribution — arithmetic mean, weighted mean, median, mode, and when to use each measure of central tendency.

Meritshot9 min read
StatisticsDescriptive StatisticsMeanMedianModeCentral Tendency
All Statistics Chapters

What Are Measures of Central Tendency?

A measure of central tendency gives you a single number that represents "the middle" of a distribution. Three measures are used:

  • Mean — the arithmetic average
  • Median — the middle value when sorted
  • Mode — the most frequent value

Choosing the right one depends on the variable type, the distribution shape, and what question you're answering.

Sample Dataset

We'll use this dataset throughout:

10 employee salaries (₹ thousands per year):
62, 68, 72, 75, 78, 82, 85, 91, 95, 200

n = 10 observations

The Mean (Arithmetic Average)

Formula

Sample mean:

x̄ = (Σxᵢ) / n = (x₁ + x₂ + ... + xₙ) / n

Population mean:

μ = (Σxᵢ) / N

Calculation

x̄ = (62 + 68 + 72 + 75 + 78 + 82 + 85 + 91 + 95 + 200) / 10
x̄ = 908 / 10
x̄ = 90.8 (₹90,800)

Properties of the Mean

  1. Uses all data points — sensitive to every value
  2. Affected by outliers — 200 (outlier) pulls mean from ~78 to 90.8
  3. Algebraically tractable — can be used in further calculations (standard deviation, regression)
  4. The sum of deviations from the mean is always zero: Σ(xᵢ − x̄) = 0

Weighted Mean

When some observations count more than others:

Weighted mean = Σ(wᵢ × xᵢ) / Σwᵢ

Example: Grade calculation
Assessment   Weight  Score
Assignments  20%     82
Midterm      30%     74
Final Exam   50%     88

Weighted mean = (0.20×82 + 0.30×74 + 0.50×88) / (0.20+0.30+0.50)
             = (16.4 + 22.2 + 44.0) / 1.0
             = 82.6

When to Use the Mean

✓ Symmetric distributions (no severe outliers) ✓ Ratio or interval scale data ✓ When you need to compute further statistics (SD, regression) ✗ Skewed distributions (income, house prices) ✗ Ordinal data ✗ When outliers are present and influential

The Median

Definition

The median is the middle value when all observations are sorted in ascending order.

Calculation

Step 1: Sort the data

Sorted: 62, 68, 72, 75, 78, 82, 85, 91, 95, 200
Positions: 1   2   3   4   5   6   7   8   9   10

Step 2: Find the middle position

  • n = 10 (even number) → median = average of positions 5 and 6
Median = (78 + 82) / 2 = 80

For odd n:

Sorted: 62, 68, 72, 75, 78, 82, 85, 91, 95
n = 9 → middle position = (9+1)/2 = 5th value
Median = 78

Properties of the Median

  1. Robust to outliers — the 200 outlier doesn't affect the median (still 80)
  2. Depends only on order — not the actual values
  3. Splits the distribution in half — 50% of values are above, 50% below
  4. Less efficient statistically (uses less information than the mean when the distribution is normal)

When to Use the Median

✓ Skewed distributions (income, house prices, healthcare costs) ✓ When outliers are present ✓ Ordinal scale data ✓ When you want the "typical" value ✗ When you need further algebraic manipulation

House price example:
₹45 lakh, ₹52 lakh, ₹58 lakh, ₹63 lakh, ₹250 lakh (luxury flat)

Mean = (45+52+58+63+250)/5 = 93.6 lakh → distorted by luxury flat
Median = 58 lakh → better represents the typical house

The Mode

Definition

The most frequently occurring value in the dataset.

Calculation

Employee grades: A, B, B, C, B, A, C, D, B, A, B
Count: A=3, B=5, C=2, D=1
Mode = B (appears 5 times)

For our salary dataset:

62, 68, 72, 75, 78, 82, 85, 91, 95, 200
→ All values appear once → No mode (or all are modes — depends on convention)

Multiple Modes

  • Unimodal: One mode
  • Bimodal: Two modes (suggests two distinct subgroups in the data)
  • Multimodal: More than two modes
Test scores: 55, 55, 72, 72, 88
Bimodal: Mode = 55 and 72
→ Might indicate two distinct student groups (struggling and performing well)

When to Use the Mode

✓ Nominal data (only measure of centre valid for nominal) ✓ Discrete data (which product size sold most) ✓ When "most popular" is what you need ✗ Continuous data (every value may be unique — no mode) ✗ When you need a mathematical measure of centre

Comparing Mean, Median, and Mode

Effect of Distribution Shape

Symmetric distribution:
Mean = Median = Mode (all three coincide)

Right-skewed (positive skew — tail to the right):
Mode < Median < Mean
→ Mean is pulled toward the long right tail
→ Example: Income distribution (few very high earners)

Left-skewed (negative skew — tail to the left):
Mean < Median < Mode
→ Mean is pulled toward the long left tail
→ Example: Age at death in developed countries (few very young deaths)
Visual:
Right-skewed:
|
|██
|████
|██████
|████████████████_________ →
     ↑     ↑    ↑
   Mode  Median Mean

The Income Analogy

5 friends' annual incomes (₹ lakhs):
4, 5, 6, 7, 8

Mean = 6, Median = 6, Mode = none

A billionaire joins:
4, 5, 6, 7, 8, 5000

Mean = 838 → "average income" is ₹838 lakh — meaningless for typical income
Median = (6+7)/2 = 6.5 → typical income barely changed — robust!

The media reports mean when describing economic growth (boosted by the wealthy) — but median when describing "middle class" — these tell different stories.

Other Types of Means

Geometric Mean

Used for growth rates and multiplicative processes:

Geometric mean = (x₁ × x₂ × ... × xₙ)^(1/n)

Investment returns over 3 years: +20%, −10%, +15%
→ Convert to multipliers: 1.20 × 0.90 × 1.15
→ Geometric mean = (1.20 × 0.90 × 1.15)^(1/3)
= (1.242)^(1/3)
= 1.0752
→ Average annual growth ≈ 7.52%

Arithmetic mean: (20 + (-10) + 15)/3 = 8.33% → OVERESTIMATES actual return

Use geometric mean for: CAGR, compound growth, percentage returns.

Harmonic Mean

Used for rates and ratios:

Harmonic mean = n / (1/x₁ + 1/x₂ + ... + 1/xₙ)

Speeds: Car travels 60 km/h for half the distance, 40 km/h for the other half
Harmonic mean = 2 / (1/60 + 1/40) = 2 / (0.0167 + 0.025) = 2/0.0417 = 48 km/h

Arithmetic mean would give 50 km/h → WRONG

Use harmonic mean for: average speed, price-to-earnings ratios, harmonic P/E in stock analysis.

Practical Examples

Example 1: Salary Negotiation

Salary data for 8 analysts in a team (₹ lakhs):
8, 10, 10, 12, 13, 15, 16, 65 (one very senior analyst)

Mean = 18.6 lakh → "average salary is 18.6L" (inflated by outlier)
Median = (12+13)/2 = 12.5 lakh → better represents the typical analyst
Mode = 10 lakh → most common salary

A new analyst asking for "above average" using the mean (18.6L) would be benchmarking against an inflated figure.

Example 2: Product Sales

Units sold per day for 30 days:
20, 22, 19, 21, 20, 23, 20, 25, 20, 24, ...

Mode = 20 units → the most typical daily sales volume for inventory planning
Mean = 21.3 units → for calculating total expected monthly sales

Example 3: Investment Returns

Meritshot Fund annual returns:
Year 1: +15%
Year 2: +8%
Year 3: −5%
Year 4: +20%
Year 5: +10%

Arithmetic mean = (15+8-5+20+10)/5 = 9.6%/year
Geometric mean = (1.15 × 1.08 × 0.95 × 1.20 × 1.10)^(1/5) − 1
                = (1.637)^(0.2) − 1
                = 1.0635 − 1
                = 6.35%/year

The geometric mean (6.35%) is the actual compound annual growth rate.
The arithmetic mean (9.6%) overstates the actual performance.

Common Mistakes

1. Using mean for skewed data

Report: "Average household income in Mumbai = ₹12 lakh"
Reality: A few billionaires pull the mean up; median might be ₹5 lakh
→ Always check skewness before reporting the mean

2. Confusing "average" with "typical"

The mean is not always typical. In a bimodal distribution (e.g., scores of 30 and 90), the mean of 60 might represent nobody.

3. Mode for continuous data

Heights: 167.1, 167.1, 172.3, 174.8, 181.2 cm
Mode = 167.1 — but this is coincidence; two people happened to be the same height
→ For continuous data, use a histogram and identify the modal class (range)

4. Ignoring weighted means

Average test score across three classes of different sizes:
Class A: 40 students, mean 65
Class B: 60 students, mean 72
Class C: 100 students, mean 80

Wrong: (65+72+80)/3 = 72.3

Correct weighted mean:
= (40×65 + 60×72 + 100×80) / (40+60+100)
= (2600 + 4320 + 8000) / 200
= 14920 / 200
= 74.6

Practice Exercises

  1. Find the mean, median, and mode for: 12, 15, 15, 18, 20, 25, 100

  2. An investor earns: Year 1: +50%, Year 2: −33%. Calculate both the arithmetic and geometric mean annual return. Which one correctly describes actual performance?

  3. Three cities have populations: City A: 200,000 (avg income ₹8L), City B: 500,000 (avg income ₹5L), City C: 300,000 (avg income ₹12L). What is the weighted average income across all three cities?

  4. A shoe store sells sizes: 6, 6, 7, 7, 7, 8, 8, 8, 8, 9, 9, 10. Which measure of centre is most useful for inventory planning? Calculate it.

  5. Without doing any calculations, predict whether mean > median or mean < median for: a) Distribution of time spent on a customer support call (most calls are short; a few are very long) b) Scores on an easy exam where most students score near 90–100

Summary

In this chapter you learned:

  • Mean x̄ = Σxᵢ/n — uses all data; sensitive to outliers; best for symmetric distributions and further computation
  • Median — middle value after sorting; robust to outliers; best for skewed data and ordinal scales
  • Mode — most frequent value; only valid measure for nominal data; useful for "most popular" questions
  • Skewed distributions: Right skew → Mean > Median > Mode; Left skew → Mean < Median < Mode; Symmetric → Mean ≈ Median ≈ Mode
  • Weighted mean = Σ(wᵢxᵢ)/Σwᵢ — when observations have different importance
  • Geometric mean — correct for growth rates and compound returns; arithmetic mean overstates
  • Harmonic mean — correct for rates (speed, P/E ratios)
  • Always plot the data first — the right measure of centre depends on the distribution shape

Next up: Measures of Spread — variance, standard deviation, and IQR — to understand how scattered the data is around its centre.