Fundamentals of Statistics — Interview Questions & Answers

50 essential statistics interview questions covering descriptive stats, probability, distributions, hypothesis testing, and regression.

Meritshot17 min read
StatisticsData ScienceInterview QuestionsProbabilityHypothesis Testing
Back to Interview Guides

Descriptive Statistics

1. What is the difference between descriptive and inferential statistics?

Descriptive statistics summarize and organize data from a sample or population using measures like mean, median, and standard deviation, without drawing conclusions beyond the data at hand. Inferential statistics use sample data to make generalizations or predictions about a larger population, relying on techniques such as hypothesis testing and confidence intervals.

2. What are the measures of central tendency?

The three main measures of central tendency are mean (arithmetic average), median (the middle value when data is sorted), and mode (the most frequently occurring value). The choice of measure depends on the data distribution; the mean is sensitive to outliers, while the median is more robust for skewed data.

3. When should you use the median instead of the mean?

The median is preferred when the data is skewed or contains extreme outliers, because the mean can be heavily influenced by unusually large or small values. For example, median household income is more representative than mean income because a few extremely wealthy individuals can inflate the average.

4. What is the mode, and can a dataset have more than one?

The mode is the value that appears most frequently in a dataset. A dataset can be unimodal (one mode), bimodal (two modes), multimodal (more than two modes), or have no mode at all if every value occurs with equal frequency.

5. What is the range, and what are its limitations?

The range is the difference between the maximum and minimum values in a dataset, providing a simple measure of spread. Its main limitation is that it only considers two extreme data points and is highly sensitive to outliers, so it does not capture how the rest of the data is distributed.

6. What is variance, and how does it relate to standard deviation?

Variance measures the average squared deviation of each data point from the mean, quantifying how spread out the data is. Standard deviation is the square root of the variance and is expressed in the same units as the original data, making it more interpretable for practical use.

7. What is the difference between population and sample standard deviation?

Population standard deviation divides the sum of squared deviations by N (the total number of data points), while sample standard deviation divides by n - 1 to correct for bias in estimating the population parameter. This correction, known as Bessel's correction, accounts for the fact that a sample tends to underestimate the true population variability.

8. What are percentiles and quartiles?

A percentile indicates the value below which a given percentage of observations fall; for example, the 90th percentile means 90% of the data lies below that value. Quartiles divide the data into four equal parts: Q1 (25th percentile), Q2 (50th percentile or median), and Q3 (75th percentile).

9. What is the interquartile range (IQR), and how is it used to detect outliers?

The IQR is the difference between the third quartile (Q3) and the first quartile (Q1), representing the spread of the middle 50% of the data. A common rule for detecting outliers is that any value below Q1 - 1.5 * IQR or above Q3 + 1.5 * IQR is considered an outlier.

10. What are skewness and kurtosis?

Skewness measures the asymmetry of a distribution: positive skew means the right tail is longer, and negative skew means the left tail is longer. Kurtosis measures the "tailedness" or peakedness of a distribution, with higher kurtosis indicating heavier tails and more extreme outliers compared to a normal distribution.

Probability Fundamentals

11. What is probability?

Probability is a numerical measure between 0 and 1 that quantifies the likelihood of an event occurring, where 0 means the event is impossible and 1 means it is certain. It forms the mathematical foundation for statistical inference, allowing us to reason about uncertainty and make predictions based on data.

12. What are the three main types of probability?

Classical (theoretical) probability assumes equally likely outcomes and is calculated as the number of favorable outcomes divided by the total number of outcomes. Empirical (experimental) probability is based on observed data from experiments, while subjective probability reflects personal belief or judgment about the likelihood of an event.

13. What is conditional probability?

Conditional probability is the probability of an event A occurring given that another event B has already occurred, denoted as P(A | B). It is calculated as P(A and B) / P(B), provided that P(B) is not zero, and it is fundamental to understanding how new information changes the likelihood of outcomes.

14. What is Bayes' theorem, and why is it important?

Bayes' theorem provides a way to update the probability of a hypothesis based on new evidence, expressed as P(A | B) = P(B | A) * P(A) / P(B). It is widely used in machine learning, medical diagnostics, spam filtering, and any domain where prior knowledge must be combined with new data to make informed decisions.

15. What is the difference between independent and dependent events?

Two events are independent if the occurrence of one does not affect the probability of the other; for example, two successive coin flips. Dependent events are those where the outcome of one event influences the probability of the other, such as drawing cards from a deck without replacement.

16. What are mutually exclusive events?

Mutually exclusive events are events that cannot occur at the same time; if one happens, the other cannot. For example, rolling a 3 and rolling a 5 on a single die roll are mutually exclusive. For such events, P(A or B) = P(A) + P(B).

17. What is the complement rule in probability?

The complement rule states that the probability of an event not occurring equals one minus the probability of the event occurring, expressed as P(A') = 1 - P(A). This rule is especially useful when calculating the probability of "at least one" occurrence, as it is often easier to compute the complement.

18. What is the addition rule of probability?

The addition rule calculates the probability that at least one of two events occurs. For any two events, P(A or B) = P(A) + P(B) - P(A and B), where the last term prevents double-counting the overlap. If the events are mutually exclusive, the overlap is zero.

19. What is the multiplication rule of probability?

The multiplication rule calculates the probability that two events both occur. For independent events, P(A and B) = P(A) * P(B), and for dependent events, P(A and B) = P(A) * P(B | A). This rule is essential for computing joint probabilities in multi-step experiments.

20. What is the difference between permutations and combinations?

Permutations count the number of ways to arrange items where order matters, while combinations count the number of ways to choose items where order does not matter. For n items taken r at a time, the number of permutations is n! / (n - r)! and the number of combinations is n! / (r! * (n - r)!).

Probability Distributions

21. What is a probability distribution?

A probability distribution describes how the probabilities of different outcomes are spread across the possible values of a random variable. Discrete distributions assign probabilities to distinct values (e.g., number of heads in coin flips), while continuous distributions describe probabilities over a range of values using a probability density function.

22. What is the normal distribution, and why is it important?

The normal distribution is a symmetric, bell-shaped continuous distribution defined by its mean and standard deviation, where approximately 68% of data falls within one standard deviation of the mean, 95% within two, and 99.7% within three. It is important because many natural phenomena follow this distribution, and it serves as the basis for many statistical tests and the central limit theorem.

23. What is the binomial distribution?

The binomial distribution models the number of successes in a fixed number of independent trials, where each trial has only two possible outcomes (success or failure) with a constant probability of success. It is defined by two parameters: n (number of trials) and p (probability of success), with a mean of n * p and variance of n * p * (1 - p).

24. What is the Poisson distribution?

The Poisson distribution models the number of events occurring in a fixed interval of time or space, given a known constant average rate. It is defined by a single parameter lambda (the average rate), and it is commonly used for rare events such as the number of customer arrivals per hour or the number of defects in a manufacturing process.

25. What is the uniform distribution?

In a uniform distribution, all outcomes are equally likely. The discrete uniform distribution assigns equal probability to each of a finite number of values (e.g., rolling a fair die), while the continuous uniform distribution has a constant probability density between two bounds a and b, with a mean of (a + b) / 2.

26. What is a standard normal distribution, and what is a z-score?

The standard normal distribution is a special case of the normal distribution with a mean of 0 and a standard deviation of 1. A z-score measures how many standard deviations a data point is from the mean, calculated as z = (x - mean) / standard deviation, and it allows comparison of values from different normal distributions.

27. What is the central limit theorem?

The central limit theorem states that the distribution of sample means approaches a normal distribution as the sample size increases, regardless of the shape of the original population distribution. This is a cornerstone of inferential statistics because it justifies the use of normal-distribution-based methods even when the underlying data is not normally distributed, typically requiring a sample size of 30 or more.

28. What is a sampling distribution?

A sampling distribution is the probability distribution of a statistic (such as the mean or proportion) obtained from all possible samples of a given size drawn from a population. The sampling distribution of the mean has the same mean as the population but a smaller standard deviation, called the standard error, equal to the population standard deviation divided by the square root of the sample size.

29. What is the law of large numbers?

The law of large numbers states that as the sample size increases, the sample mean converges to the population mean. In practical terms, this means that larger samples provide more reliable estimates of population parameters, and the variability of the sample mean decreases as more observations are collected.

30. What is the t-distribution, and when is it used?

The t-distribution is a symmetric, bell-shaped distribution that is similar to the normal distribution but has heavier tails, meaning it assigns more probability to extreme values. It is used instead of the normal distribution when the sample size is small (typically n is less than 30) and the population standard deviation is unknown, with the shape becoming closer to the normal distribution as the degrees of freedom increase.

Hypothesis Testing

31. What is a null hypothesis and an alternative hypothesis?

The null hypothesis (H0) is a statement of no effect or no difference, representing the default assumption that any observed pattern is due to chance. The alternative hypothesis (H1 or Ha) is the claim being tested, asserting that there is a real effect, difference, or relationship in the population.

32. What is the significance level (alpha)?

The significance level, commonly denoted as alpha, is the threshold probability used to decide whether to reject the null hypothesis, typically set at 0.05. It represents the maximum acceptable probability of making a Type I error (rejecting a true null hypothesis), so an alpha of 0.05 means there is a 5% risk of a false positive.

33. What is a p-value?

A p-value is the probability of obtaining a test statistic at least as extreme as the one observed, assuming the null hypothesis is true. If p-value is less than alpha, we reject the null hypothesis; otherwise, we fail to reject it. A smaller p-value indicates stronger evidence against the null hypothesis.

34. What is a Type I error?

A Type I error (false positive) occurs when the null hypothesis is rejected even though it is actually true. The probability of a Type I error is equal to the significance level alpha. For example, concluding that a drug is effective when it actually has no effect is a Type I error.

35. What is a Type II error, and what is statistical power?

A Type II error (false negative) occurs when we fail to reject the null hypothesis even though it is actually false, with its probability denoted as beta. Statistical power, defined as 1 - beta, is the probability of correctly rejecting a false null hypothesis. Power can be increased by using a larger sample size, a higher significance level, or when the true effect size is larger.

36. What is the difference between a one-tailed and a two-tailed test?

A one-tailed test evaluates whether a parameter is either greater than or less than a specific value, concentrating the entire rejection region in one tail of the distribution. A two-tailed test checks for a difference in either direction, splitting the rejection region between both tails. One-tailed tests are more powerful for detecting effects in a specific direction but miss effects in the opposite direction.

37. What is a t-test, and what are its types?

A t-test is a statistical test used to determine whether there is a significant difference between the means of one or two groups when the population standard deviation is unknown. The three main types are: one-sample t-test (comparing a sample mean to a known value), independent two-sample t-test (comparing means of two unrelated groups), and paired t-test (comparing means from the same group at different times or conditions).

38. When would you use a z-test instead of a t-test?

A z-test is used when the population standard deviation is known and the sample size is large (typically n is 30 or more), as the sampling distribution of the mean is approximately normal under these conditions. In practice, z-tests are less common than t-tests because the population standard deviation is rarely known, and t-tests converge to similar results as sample size grows.

39. What is a chi-square test?

A chi-square test is a non-parametric test used to examine the association between categorical variables. The chi-square test of independence determines whether two categorical variables are related, while the chi-square goodness-of-fit test checks whether an observed frequency distribution matches an expected distribution. It compares observed and expected frequencies using the formula sum of (observed - expected) squared / expected.

40. What is ANOVA, and when is it used?

ANOVA (Analysis of Variance) is used to test whether the means of three or more groups are significantly different from each other. It works by comparing the variance between groups to the variance within groups using an F-statistic. If the F-statistic is large enough to be significant, it indicates that at least one group mean differs, but post-hoc tests (such as Tukey's HSD) are needed to identify which specific groups differ.

Correlation and Regression

41. What is the difference between correlation and causation?

Correlation measures the strength and direction of a linear relationship between two variables, but it does not imply that one variable causes the other. Causation means that a change in one variable directly produces a change in another. Establishing causation typically requires controlled experiments or rigorous methods like randomized controlled trials to rule out confounding variables.

42. What is Pearson correlation, and what are its properties?

The Pearson correlation coefficient (r) measures the strength and direction of the linear relationship between two continuous variables, ranging from -1 (perfect negative linear relationship) to +1 (perfect positive linear relationship), with 0 indicating no linear relationship. It assumes that both variables are normally distributed, the relationship is linear, and there are no significant outliers.

43. What is Spearman correlation, and how does it differ from Pearson?

Spearman's rank correlation coefficient measures the strength and direction of the monotonic relationship between two variables by using the ranks of the data rather than the raw values. Unlike Pearson correlation, Spearman does not assume a linear relationship or normal distribution, making it more suitable for ordinal data or when the relationship is monotonic but not necessarily linear.

44. What is simple linear regression?

Simple linear regression models the relationship between one independent variable (predictor) and one dependent variable (response) by fitting a straight line of the form y = b0 + b1 * x, where b0 is the intercept and b1 is the slope. The coefficients are estimated using the ordinary least squares method, which minimizes the sum of squared differences between observed and predicted values.

45. What is multiple regression?

Multiple regression extends simple linear regression by including two or more independent variables to predict a single dependent variable, taking the form y = b0 + b1*x1 + b2*x2 + ... + bn*xn. It allows analysts to examine the effect of each predictor while controlling for the others, providing a more comprehensive model when multiple factors influence the outcome.

46. What is R-squared, and what does it tell you?

R-squared (coefficient of determination) represents the proportion of variance in the dependent variable that is explained by the independent variable(s) in a regression model, ranging from 0 to 1. An R-squared of 0.85, for example, means that 85% of the variability in the response is accounted for by the model. However, a high R-squared does not necessarily mean the model is good, as it can be inflated by adding irrelevant predictors.

47. What is adjusted R-squared, and why is it preferred over R-squared?

Adjusted R-squared modifies the R-squared value by penalizing the addition of predictors that do not improve the model, accounting for the number of independent variables relative to the sample size. Unlike R-squared, which always increases (or stays the same) when new predictors are added, adjusted R-squared can decrease if a new variable does not contribute meaningful explanatory power. This makes it a better metric for comparing models with different numbers of predictors.

48. What are the key assumptions of linear regression?

The key assumptions of linear regression are: linearity (the relationship between predictors and the response is linear), independence (observations are independent of each other), homoscedasticity (constant variance of residuals), normality (residuals are normally distributed), and no multicollinearity (predictors are not highly correlated with each other). Violations of these assumptions can lead to biased or inefficient coefficient estimates and unreliable predictions.

49. What are residuals, and why are they important?

Residuals are the differences between the observed values and the values predicted by the regression model, calculated as residual = observed - predicted. Analyzing residuals is crucial for diagnosing model fit: ideally, residuals should be randomly scattered around zero with constant variance and no discernible pattern. Patterns in residual plots can reveal problems such as non-linearity, heteroscedasticity, or the presence of outliers.

50. What is multicollinearity, and how do you detect it?

Multicollinearity occurs when two or more independent variables in a regression model are highly correlated with each other, making it difficult to isolate the individual effect of each predictor. It can be detected using the Variance Inflation Factor (VIF), where a VIF value exceeding 5 or 10 typically indicates problematic multicollinearity. Solutions include removing one of the correlated variables, combining them into a single variable, or using regularization techniques like ridge regression.