Why Statistical Functions?
Raw numbers don't tell a story — statistics do. Statistical functions help you understand distributions, spot outliers, compare performance, and summarise large datasets into meaningful metrics.
Sample Dataset
A B C D E
Emp ID Department Salary Rating Years
EMP-001 Finance 78000 4 3
EMP-002 Technology 95000 5 7
EMP-003 Marketing 68000 3 2
EMP-004 Finance 82000 4 5
EMP-005 Technology 110000 5 9
EMP-006 Marketing 72000 4 4
EMP-007 Finance 91000 5 6
EMP-008 Technology 88000 3 3
EMP-009 HR 62000 4 2
EMP-010 Finance 75000 3 4
SUMIF and SUMIFS (Review and Advanced)
SUMIF with Wildcards
=SUMIF(B2:B11, "Tech*", C2:C11)
→ Sum salaries for any department starting with "Tech"
=SUMIF(B2:B11, "*nology", C2:C11)
→ Sum for departments ending in "nology"
=SUMIF(C2:C11, ">"&AVERAGE(C2:C11), C2:C11)
→ Sum salaries above average (dynamic criteria using cell reference)
SUMIFS with Date Range
-- Sum sales between two dates:
=SUMIFS(Sales!C:C, Sales!A:A, ">="&DATE(2026,1,1), Sales!A:A, "<="&DATE(2026,3,31))
→ Q1 2026 sales total
SUMPRODUCT — The Flexible Alternative
When SUMIFS isn't enough (OR conditions, calculated criteria):
=SUMPRODUCT((B2:B11="Finance") * C2:C11)
→ Sum Finance salaries — same as SUMIF
=SUMPRODUCT(((B2:B11="Finance")+(B2:B11="Technology")) * C2:C11)
→ Sum Finance OR Technology (OR condition — SUMIFS can't do this!)
=SUMPRODUCT((D2:D11>=4) * (C2:C11>80000) * C2:C11)
→ Sum salaries for rating ≥ 4 AND salary > 80000
COUNTIF and COUNTIFS (Advanced)
-- Count using cell reference as criteria:
=COUNTIF(C2:C11, ">"&B15) → count above threshold in B15
-- Count unique values (array formula, Ctrl+Shift+Enter in old Excel):
=SUMPRODUCT(1/COUNTIF(B2:B11, B2:B11))
→ Count of unique departments
-- Count non-blank:
=COUNTA(B2:B11) - COUNTBLANK(B2:B11) → same as COUNTA alone
AVERAGEIF and AVERAGEIFS
=AVERAGEIF(B2:B11, "Finance", C2:C11)
→ Average salary for Finance department: (78000+82000+91000+75000)/4 = 81500
=AVERAGEIFS(C2:C11, B2:B11, "Technology", D2:D11, ">=4")
→ Average salary for Technology employees with rating ≥ 4
Descriptive Statistics
Measures of Central Tendency
=AVERAGE(C2:C11) → arithmetic mean (81100 for our data)
=MEDIAN(C2:C11) → middle value — less sensitive to outliers (80000)
=MODE(C2:C11) → most frequent value (might be #N/A if all unique)
=MODE.SNGL(C2:C11) → single mode (Excel 2010+)
=MODE.MULT(C2:C11) → all modes if multiple values share the peak
Measures of Spread
=MAX(C2:C11) - MIN(C2:C11) → range (110000 - 62000 = 48000)
=STDEV(C2:C11) → sample standard deviation (estimates population from sample)
=STDEV.S(C2:C11) → same as STDEV (explicit sample version)
=STDEV.P(C2:C11) → population standard deviation (when you have ALL data)
=VAR.S(C2:C11) → variance (sample)
=VAR.P(C2:C11) → variance (population)
Standard deviation tells you: On average, how far is each value from the mean?
If AVERAGE = 81100 and STDEV = 14000:
→ Most salaries fall between 67100 and 95100 (±1 SD)
→ A salary of 110000 is (110000-81100)/14000 = 2.1 standard deviations above average
Percentiles and Quartiles
=PERCENTILE(C2:C11, 0.90) → 90th percentile salary
=PERCENTILE.INC(C2:C11, 0.5) → same as MEDIAN
=PERCENTILE.EXC(C2:C11, 0.9) → exclusive (doesn't include endpoints)
=QUARTILE(C2:C11, 1) → Q1 (25th percentile)
=QUARTILE(C2:C11, 2) → Q2 (50th = median)
=QUARTILE(C2:C11, 3) → Q3 (75th percentile)
IQR = Q3 - Q1
Upper fence (outlier threshold) = Q3 + 1.5 * IQR
Lower fence = Q1 - 1.5 * IQR
RANK Functions
Rank values within a list:
=RANK(C2, $C$2:$C$11) → rank of C2 in range (1=highest by default)
=RANK.EQ(C2, $C$2:$C$11, 0) → 0=descending (highest rank=1), same as RANK
=RANK.EQ(C2, $C$2:$C$11, 1) → 1=ascending (lowest value=rank 1)
=RANK.AVG(C2, $C$2:$C$11) → averages rank when there are ties
Example: Rank each employee's salary:
F2: =RANK.EQ(C2, $C$2:$C$11)
→ EMP-002 (95000) → 3rd, EMP-005 (110000) → 1st, EMP-009 (62000) → 10th
LARGE and SMALL
Return the nth largest or smallest value:
=LARGE(C2:C11, 1) → highest salary: 110000
=LARGE(C2:C11, 2) → 2nd highest: 95000
=LARGE(C2:C11, 3) → 3rd highest: 91000
=SMALL(C2:C11, 1) → lowest: 62000
=SMALL(C2:C11, 2) → 2nd lowest: 68000
Top 3 salaries:
=LARGE($C$2:$C$11, ROW(A1)) → drag down 3 rows: returns 1st, 2nd, 3rd largest
MIN, MAX with Conditions
Standard MIN/MAX don't take criteria — work around with MINIFS/MAXIFS (Excel 2019+) or array formulas:
=MINIFS(C2:C11, B2:B11, "Finance") → lowest Finance salary
=MAXIFS(C2:C11, B2:B11, "Technology") → highest Tech salary
-- Pre-2019 equivalent (Ctrl+Shift+Enter array formula):
=MIN(IF(B2:B11="Finance", C2:C11))
Correlation and Regression
=CORREL(C2:C11, E2:E11) → correlation between salary and years: how related are they?
→ -1 = perfect inverse, 0 = no relationship, +1 = perfect positive correlation
=SLOPE(C2:C11, E2:E11) → slope of the best-fit line (salary per year of experience)
=INTERCEPT(C2:C11, E2:E11) → y-intercept
=RSQ(C2:C11, E2:E11) → R-squared (how well years predicts salary; 0–1)
FREQUENCY — Distribution Analysis
Count how many values fall in each bucket:
Bins in F2:F5: 70000, 80000, 90000, 100000
-- Select G2:G6 (one more row than bins), enter formula, Ctrl+Shift+Enter:
{=FREQUENCY(C2:C11, F2:F5)}
→ G2: count < 70000
→ G3: count 70001-80000
→ G4: count 80001-90000
→ G5: count 90001-100000
→ G6: count > 100000
Z-Score (Standardisation)
How many standard deviations is a value from the mean?
=STANDARDIZE(C2, AVERAGE($C$2:$C$11), STDEV.S($C$2:$C$11))
-- Or manually:
=(C2 - AVERAGE($C$2:$C$11)) / STDEV.S($C$2:$C$11)
→ Z-score of 2 = 2 standard deviations above average
→ Z-score of -1 = 1 SD below average
→ |Z| > 2 typically flags as an outlier
Practical Examples
Example 1: Salary Band Analysis
Summary table:
Department | Count | Avg Salary | Min | Max | Median | Std Dev
Finance | =COUNTIF(B:B,"Finance") | =AVERAGEIF(B:B,"Finance",C:C) | =MINIFS(C:C,B:B,"Finance") | =MAXIFS(C:C,B:B,"Finance") | ...
Technology | ...
Marketing | ...
Example 2: Performance Distribution
-- How many employees in each rating?
=COUNTIF($D$2:$D$11, 1) → count of rating 1
=COUNTIF($D$2:$D$11, 2) → count of rating 2
...
=COUNTIF($D$2:$D$11, 5)
-- Average salary per rating band:
=AVERAGEIF($D$2:$D$11, 5, $C$2:$C$11) → average salary of top performers
Example 3: Outlier Detection
Q1 = =QUARTILE($C$2:$C$11, 1)
Q3 = =QUARTILE($C$2:$C$11, 3)
IQR = Q3 - Q1
Upper fence = Q3 + 1.5 * IQR
Lower fence = Q1 - 1.5 * IQR
-- Flag outliers:
=IF(OR(C2 > (Q3+1.5*IQR), C2 < (Q1-1.5*IQR)), "Outlier", "Normal")
Example 4: Sales Performance Ranking Report
A = Salesperson, B = Monthly Revenue
Rank: =RANK.EQ(B2, $B$2:$B$11)
Top Quartile: =IF(B2 >= PERCENTILE($B$2:$B$11, 0.75), "Top 25%", "")
Z-Score: =(B2-AVERAGE($B$2:$B$11))/STDEV.S($B$2:$B$11)
Common Mistakes
1. STDEV vs STDEV.P confusion
STDEV / STDEV.S → sample (use when data is a sample of a larger population)
STDEV.P → population (use when data IS the entire population)
For most business analyses (employee sample, sales sample), use STDEV.S.
2. AVERAGE ignores empty cells but not zero
=AVERAGE(0, 0, 100) → 33.3 (zeros count!)
=AVERAGE("", "", 100) → 100 (text/blanks excluded automatically)
If you want to average only non-zero values:
=AVERAGEIF(C2:C11, "<>0")
3. RANK gives same rank to ties but skips the next rank
Values: 100, 100, 80
RANK: 1, 1, 3 (not 1, 1, 2!)
Use RANK.AVG for fair tie-breaking: 1.5, 1.5, 3
4. PERCENTILE vs QUARTILE
=QUARTILE(range, 1) = =PERCENTILE(range, 0.25)
=QUARTILE(range, 2) = =PERCENTILE(range, 0.5) = MEDIAN
=QUARTILE(range, 3) = =PERCENTILE(range, 0.75)
Practice Exercises
- For each department in the sample data, calculate: count, average salary, minimum, maximum, and standard deviation.
- Rank all employees by salary (1 = highest) and add a "Top 25%" flag using PERCENTILE.
- Identify salary outliers using the IQR method (values beyond Q1 - 1.5×IQR or Q3 + 1.5×IQR).
- Calculate the correlation between years of experience and salary. Is there a strong relationship?
- Use FREQUENCY to create a salary distribution showing how many employees fall in each ₹10,000 bracket from ₹60,000 to ₹120,000.
Summary
In this chapter you learned:
SUMIF/SUMIFSwith wildcards and dynamic criteria;SUMPRODUCTfor OR conditionsAVERAGEIF/AVERAGEIFS— conditional averageMEDIAN— outlier-resistant central tendency;MODE.SNGL— most frequent valueSTDEV.S(sample) vsSTDEV.P(population);VAR.S— variancePERCENTILE(range, k)— k as decimal (0–1);QUARTILE(range, 1/2/3)for Q1/Q2/Q3RANK.EQ(value, range)— rank descending;RANK.AVG— average rank on tiesLARGE(range, n)/SMALL(range, n)— nth largest/smallestMINIFS/MAXIFS— conditional min and max (Excel 2019+)CORREL— correlation coefficient;SLOPE,INTERCEPT,RSQfor regressionFREQUENCY— distribution buckets (array formula)- Z-score:
(value - mean) / stdev— how many standard deviations from average
Next up: Sorting & Filtering — organise and slice your data with sort, AutoFilter, and Advanced Filter.