Statistical Functions

Why Statistical Functions?

Raw numbers don't tell a story — statistics do. Statistical functions help you understand distributions, spot outliers, compare performance, and summarise large datasets into meaningful metrics.

Sample Dataset

A           B           C       D       E
Emp ID      Department  Salary  Rating  Years
EMP-001     Finance     78000   4       3
EMP-002     Technology  95000   5       7
EMP-003     Marketing   68000   3       2
EMP-004     Finance     82000   4       5
EMP-005     Technology  110000  5       9
EMP-006     Marketing   72000   4       4
EMP-007     Finance     91000   5       6
EMP-008     Technology  88000   3       3
EMP-009     HR          62000   4       2
EMP-010     Finance     75000   3       4

SUMIF and SUMIFS (Review and Advanced)

SUMIF with Wildcards

=SUMIF(B2:B11, "Tech*", C2:C11)
→ Sum salaries for any department starting with "Tech"

=SUMIF(B2:B11, "*nology", C2:C11)
→ Sum for departments ending in "nology"

=SUMIF(C2:C11, ">"&AVERAGE(C2:C11), C2:C11)
→ Sum salaries above average (dynamic criteria using cell reference)

SUMIFS with Date Range

-- Sum sales between two dates:
=SUMIFS(Sales!C:C, Sales!A:A, ">="&DATE(2026,1,1), Sales!A:A, "<="&DATE(2026,3,31))
→ Q1 2026 sales total

SUMPRODUCT — The Flexible Alternative

When SUMIFS isn't enough (OR conditions, calculated criteria):

=SUMPRODUCT((B2:B11="Finance") * C2:C11)
→ Sum Finance salaries — same as SUMIF

=SUMPRODUCT(((B2:B11="Finance")+(B2:B11="Technology")) * C2:C11)
→ Sum Finance OR Technology (OR condition — SUMIFS can't do this!)

=SUMPRODUCT((D2:D11>=4) * (C2:C11>80000) * C2:C11)
→ Sum salaries for rating ≥ 4 AND salary > 80000

COUNTIF and COUNTIFS (Advanced)

-- Count using cell reference as criteria:
=COUNTIF(C2:C11, ">"&B15)    → count above threshold in B15

-- Count unique values (array formula, Ctrl+Shift+Enter in old Excel):
=SUMPRODUCT(1/COUNTIF(B2:B11, B2:B11))
→ Count of unique departments

-- Count non-blank:
=COUNTA(B2:B11) - COUNTBLANK(B2:B11)   → same as COUNTA alone

AVERAGEIF and AVERAGEIFS

=AVERAGEIF(B2:B11, "Finance", C2:C11)
→ Average salary for Finance department: (78000+82000+91000+75000)/4 = 81500

=AVERAGEIFS(C2:C11, B2:B11, "Technology", D2:D11, ">=4")
→ Average salary for Technology employees with rating ≥ 4

Descriptive Statistics

Measures of Central Tendency

=AVERAGE(C2:C11)     → arithmetic mean (81100 for our data)
=MEDIAN(C2:C11)      → middle value — less sensitive to outliers (80000)
=MODE(C2:C11)        → most frequent value (might be #N/A if all unique)
=MODE.SNGL(C2:C11)   → single mode (Excel 2010+)
=MODE.MULT(C2:C11)   → all modes if multiple values share the peak

Measures of Spread

=MAX(C2:C11) - MIN(C2:C11)   → range (110000 - 62000 = 48000)

=STDEV(C2:C11)       → sample standard deviation (estimates population from sample)
=STDEV.S(C2:C11)     → same as STDEV (explicit sample version)
=STDEV.P(C2:C11)     → population standard deviation (when you have ALL data)

=VAR.S(C2:C11)       → variance (sample)
=VAR.P(C2:C11)       → variance (population)

Standard deviation tells you: On average, how far is each value from the mean?

If AVERAGE = 81100 and STDEV = 14000:
→ Most salaries fall between 67100 and 95100 (±1 SD)
→ A salary of 110000 is (110000-81100)/14000 = 2.1 standard deviations above average

Percentiles and Quartiles

=PERCENTILE(C2:C11, 0.90)    → 90th percentile salary
=PERCENTILE.INC(C2:C11, 0.5) → same as MEDIAN
=PERCENTILE.EXC(C2:C11, 0.9) → exclusive (doesn't include endpoints)

=QUARTILE(C2:C11, 1)   → Q1 (25th percentile)
=QUARTILE(C2:C11, 2)   → Q2 (50th = median)
=QUARTILE(C2:C11, 3)   → Q3 (75th percentile)

IQR = Q3 - Q1
Upper fence (outlier threshold) = Q3 + 1.5 * IQR
Lower fence = Q1 - 1.5 * IQR

RANK Functions

Rank values within a list:

=RANK(C2, $C$2:$C$11)         → rank of C2 in range (1=highest by default)
=RANK.EQ(C2, $C$2:$C$11, 0)  → 0=descending (highest rank=1), same as RANK
=RANK.EQ(C2, $C$2:$C$11, 1)  → 1=ascending (lowest value=rank 1)
=RANK.AVG(C2, $C$2:$C$11)    → averages rank when there are ties

Example: Rank each employee's salary:

F2: =RANK.EQ(C2, $C$2:$C$11)
→ EMP-002 (95000) → 3rd, EMP-005 (110000) → 1st, EMP-009 (62000) → 10th

LARGE and SMALL

Return the nth largest or smallest value:

=LARGE(C2:C11, 1)    → highest salary: 110000
=LARGE(C2:C11, 2)    → 2nd highest: 95000
=LARGE(C2:C11, 3)    → 3rd highest: 91000

=SMALL(C2:C11, 1)    → lowest: 62000
=SMALL(C2:C11, 2)    → 2nd lowest: 68000

Top 3 salaries:

=LARGE($C$2:$C$11, ROW(A1))   → drag down 3 rows: returns 1st, 2nd, 3rd largest

MIN, MAX with Conditions

Standard MIN/MAX don't take criteria — work around with MINIFS/MAXIFS (Excel 2019+) or array formulas:

=MINIFS(C2:C11, B2:B11, "Finance")    → lowest Finance salary
=MAXIFS(C2:C11, B2:B11, "Technology") → highest Tech salary

-- Pre-2019 equivalent (Ctrl+Shift+Enter array formula):
=MIN(IF(B2:B11="Finance", C2:C11))

Correlation and Regression

=CORREL(C2:C11, E2:E11)    → correlation between salary and years: how related are they?
→ -1 = perfect inverse, 0 = no relationship, +1 = perfect positive correlation

=SLOPE(C2:C11, E2:E11)     → slope of the best-fit line (salary per year of experience)
=INTERCEPT(C2:C11, E2:E11) → y-intercept
=RSQ(C2:C11, E2:E11)       → R-squared (how well years predicts salary; 0–1)

FREQUENCY — Distribution Analysis

Count how many values fall in each bucket:

Bins in F2:F5: 70000, 80000, 90000, 100000

-- Select G2:G6 (one more row than bins), enter formula, Ctrl+Shift+Enter:
{=FREQUENCY(C2:C11, F2:F5)}
→ G2: count < 70000
→ G3: count 70001-80000
→ G4: count 80001-90000
→ G5: count 90001-100000
→ G6: count > 100000

Z-Score (Standardisation)

How many standard deviations is a value from the mean?

=STANDARDIZE(C2, AVERAGE($C$2:$C$11), STDEV.S($C$2:$C$11))

-- Or manually:
=(C2 - AVERAGE($C$2:$C$11)) / STDEV.S($C$2:$C$11)

→ Z-score of 2 = 2 standard deviations above average
→ Z-score of -1 = 1 SD below average
→ |Z| > 2 typically flags as an outlier

Practical Examples

Example 1: Salary Band Analysis

Summary table:

Department | Count | Avg Salary | Min | Max | Median | Std Dev
Finance    | =COUNTIF(B:B,"Finance") | =AVERAGEIF(B:B,"Finance",C:C) | =MINIFS(C:C,B:B,"Finance") | =MAXIFS(C:C,B:B,"Finance") | ...
Technology | ...
Marketing  | ...

Example 2: Performance Distribution

-- How many employees in each rating?
=COUNTIF($D$2:$D$11, 1)   → count of rating 1
=COUNTIF($D$2:$D$11, 2)   → count of rating 2
... 
=COUNTIF($D$2:$D$11, 5)

-- Average salary per rating band:
=AVERAGEIF($D$2:$D$11, 5, $C$2:$C$11)  → average salary of top performers

Example 3: Outlier Detection

Q1 = =QUARTILE($C$2:$C$11, 1)
Q3 = =QUARTILE($C$2:$C$11, 3)
IQR = Q3 - Q1

Upper fence = Q3 + 1.5 * IQR
Lower fence = Q1 - 1.5 * IQR

-- Flag outliers:
=IF(OR(C2 > (Q3+1.5*IQR), C2 < (Q1-1.5*IQR)), "Outlier", "Normal")

Example 4: Sales Performance Ranking Report

A = Salesperson, B = Monthly Revenue

Rank: =RANK.EQ(B2, $B$2:$B$11)
Top Quartile: =IF(B2 >= PERCENTILE($B$2:$B$11, 0.75), "Top 25%", "")
Z-Score: =(B2-AVERAGE($B$2:$B$11))/STDEV.S($B$2:$B$11)

Common Mistakes

1. STDEV vs STDEV.P confusion

STDEV / STDEV.S → sample (use when data is a sample of a larger population)
STDEV.P         → population (use when data IS the entire population)

For most business analyses (employee sample, sales sample), use STDEV.S.

2. AVERAGE ignores empty cells but not zero

=AVERAGE(0, 0, 100)   → 33.3 (zeros count!)
=AVERAGE("", "", 100) → 100 (text/blanks excluded automatically)

If you want to average only non-zero values:

=AVERAGEIF(C2:C11, "<>0")

3. RANK gives same rank to ties but skips the next rank

Values: 100, 100, 80
RANK: 1, 1, 3   (not 1, 1, 2!)
Use RANK.AVG for fair tie-breaking: 1.5, 1.5, 3

4. PERCENTILE vs QUARTILE

=QUARTILE(range, 1)     = =PERCENTILE(range, 0.25)
=QUARTILE(range, 2)     = =PERCENTILE(range, 0.5) = MEDIAN
=QUARTILE(range, 3)     = =PERCENTILE(range, 0.75)

Practice Exercises

For each department in the sample data, calculate: count, average salary, minimum, maximum, and standard deviation.
Rank all employees by salary (1 = highest) and add a "Top 25%" flag using PERCENTILE.
Identify salary outliers using the IQR method (values beyond Q1 - 1.5×IQR or Q3 + 1.5×IQR).
Calculate the correlation between years of experience and salary. Is there a strong relationship?
Use FREQUENCY to create a salary distribution showing how many employees fall in each ₹10,000 bracket from ₹60,000 to ₹120,000.

Summary

In this chapter you learned:

SUMIF/SUMIFS with wildcards and dynamic criteria; SUMPRODUCT for OR conditions
AVERAGEIF/AVERAGEIFS — conditional average
MEDIAN — outlier-resistant central tendency; MODE.SNGL — most frequent value
STDEV.S (sample) vs STDEV.P (population); VAR.S — variance
PERCENTILE(range, k) — k as decimal (0–1); QUARTILE(range, 1/2/3) for Q1/Q2/Q3
RANK.EQ(value, range) — rank descending; RANK.AVG — average rank on ties
LARGE(range, n) / SMALL(range, n) — nth largest/smallest
MINIFS / MAXIFS — conditional min and max (Excel 2019+)
CORREL — correlation coefficient; SLOPE, INTERCEPT, RSQ for regression
FREQUENCY — distribution buckets (array formula)
Z-score: (value - mean) / stdev — how many standard deviations from average

Next up: Sorting & Filtering — organise and slice your data with sort, AutoFilter, and Advanced Filter.