Time Series Fundamentals
1. What is a time series?
A time series is a sequence of data points collected or recorded at successive points in time, typically at uniform intervals. Examples include daily stock prices, monthly sales figures, hourly sensor readings, and annual GDP. Time series analysis aims to extract meaningful patterns (trends, seasonality, cycles) and build models that use those patterns to forecast future values. Unlike cross-sectional data, observations in a time series are ordered and often autocorrelated — the value at time t is related to values at prior time steps.
2. What are the four components of a time series?
A time series can be decomposed into four components: Trend (the long-term direction — increasing, decreasing, or flat), Seasonality (regular, repeating patterns over a fixed period — daily, weekly, monthly, annual), Cyclical patterns (irregular fluctuations over longer periods due to economic cycles — unlike seasonality, cycles have variable length), and Irregular/Residual (random noise remaining after removing the other components). Additive decomposition: Y = Trend + Seasonal + Cyclical + Residual. Multiplicative decomposition: Y = Trend × Seasonal × Cyclical × Residual (used when seasonal effects grow proportionally with the level).
3. What is stationarity and why does it matter?
A time series is stationary when its statistical properties — mean, variance, and autocorrelation — are constant over time. Stationarity matters because most classical time series models (ARMA, ARIMA) assume the series is stationary. A non-stationary series with a trend violates this assumption and produces spurious correlations and unreliable forecasts. Tests for stationarity include the Augmented Dickey-Fuller (ADF) test (null: unit root present/non-stationary), KPSS test (null: stationary), and the Phillips-Perron test. If non-stationary, differencing or log-transformation is applied to achieve stationarity.
4. What is the difference between trend and seasonality?
A trend is the long-term, underlying direction of a time series — a consistent increase or decrease over months or years (e.g., rising annual revenue, increasing global temperature). Seasonality is a periodic, predictable pattern that repeats at known, fixed intervals (e.g., higher retail sales every December, higher ice cream sales every summer). Seasonality is calendar-driven and has a fixed, known period. A trend may coexist with seasonality — for example, rising annual sales with seasonal peaks every Q4. Decomposition separates them to analyse each independently.
5. What is autocorrelation and the ACF plot?
Autocorrelation measures the correlation of a time series with a lagged version of itself. Autocorrelation at lag k = Corr(Y_t, Y_). The ACF (Autocorrelation Function) plot displays autocorrelations at multiple lags with confidence bands (typically ±1.96/√n). Spikes outside the bands indicate significant autocorrelations. In a random walk, all lags are significant. In a stationary series with AR patterns, autocorrelations decay exponentially. In an MA process, autocorrelations cut off sharply after lag q. The ACF and PACF plots together are used to identify appropriate ARIMA model orders.
6. What is the PACF and how does it differ from ACF?
The PACF (Partial Autocorrelation Function) measures the correlation between Y_t and Y_ after removing the effects of intermediate lags. While the ACF shows total correlation at each lag, the PACF isolates the direct relationship. In identifying ARIMA models: the ACF cuts off after lag q (suggesting MA(q) order) and the PACF decays geometrically (for pure MA). The PACF cuts off after lag p (suggesting AR(p) order) and the ACF decays geometrically (for pure AR). Together, they form the Box-Jenkins identification framework for selecting ARMA model orders.
7. What is a random walk?
A random walk is a non-stationary process where each value is equal to the previous value plus a random shock: Y_t = Y_ + ε_t, where ε_t is white noise. It has a unit root — the ACF decays very slowly toward zero, while a stationary AR(1) process decays exponentially. Stock prices are commonly modelled as random walks (Efficient Market Hypothesis). Taking first differences of a random walk (Y_t − Y_ = ε_t) produces a stationary white noise process. A random walk with drift adds a constant: Y_t = c + Y_ + ε_t, creating a trending series.
8. What is differencing and when is it used?
Differencing transforms a non-stationary series into a stationary one by computing the difference between consecutive observations: ΔY_t = Y_t − Y_. If one round of differencing achieves stationarity, the series is called I(1) (integrated of order 1). If it requires differencing twice, it is I(2). Seasonal differencing removes seasonality: ΔsY_t = Y_t − Y_, where s is the seasonal period. Over-differencing (more differences than needed) should be avoided as it introduces unnecessary moving average terms. The number of differences required is the "d" parameter in ARIMA(p,d,q).
9. What is white noise?
White noise is a stationary time series with zero mean, constant variance, and no autocorrelation at any lag — observations are independent and identically distributed (i.i.d.). Checking whether model residuals are white noise is the primary diagnostic for time series models: if residuals contain systematic patterns (significant ACF spikes), the model has not captured all information and needs improvement. The Ljung-Box test formally tests whether multiple autocorrelations are jointly zero (null: no autocorrelation through lag h). A good model leaves only white noise residuals.
10. What is cointegration?
Cointegration describes a long-run equilibrium relationship between two or more non-stationary time series that tend to move together over time such that a linear combination of them is stationary. For example, stock prices of two companies in the same sector may both be I(1) random walks but their price spread may be stationary — they are cointegrated. Cointegration is tested with the Engle-Granger test or Johansen test. Cointegrated series are modelled with a Vector Error Correction Model (VECM) which captures both short-term dynamics and long-term equilibrium adjustment. It is the basis for pairs trading strategies.
ARIMA Models
11. What is ARIMA?
ARIMA (AutoRegressive Integrated Moving Average) is a widely used statistical model for univariate time series forecasting. It has three components: AR (AutoRegressive) — the current value is a linear combination of p past values; I (Integrated) — the series has been differenced d times to achieve stationarity; MA (Moving Average) — the current value depends on q past error terms. ARIMA(p,d,q) is specified after identifying the appropriate orders using ACF/PACF plots and the ADF test. It models linear dependencies and works well for short-to-medium term forecasting on stationary or made-stationary series.
12. How do you select ARIMA parameters (p, d, q)?
The Box-Jenkins methodology: (1) Test for stationarity (ADF test). If non-stationary, apply d differences. (2) Plot ACF and PACF of the stationary series: if ACF tails off and PACF cuts off after lag p → AR(p); if PACF tails off and ACF cuts off after lag q → MA(q); if both tail off → ARMA(p,q). (3) Estimate multiple candidate models and select using information criteria: AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion) — lower is better. (4) Check residuals for white noise. Python's auto_arima (pmdarima) automates this search.
13. What is SARIMA and when is it used?
SARIMA (Seasonal ARIMA), written as ARIMA(p,d,q)(P,D,Q)[s], extends ARIMA to handle seasonality. The seasonal component (P,D,Q) at seasonal period s captures patterns that repeat every s periods (e.g., s=12 for monthly data with annual seasonality). Seasonal differencing (D) removes seasonal non-stationarity. Seasonal AR (P) and MA (Q) terms model autocorrelation at seasonal lags. For example, ARIMA(1,1,1)(1,1,1)[12] models a monthly series with both trend and annual seasonality. SARIMA is appropriate when the ACF shows significant spikes at seasonal lags (e.g., lags 12, 24, 36 for monthly data).
14. What is the difference between AR and MA components?
An AR(p) model expresses the current value as a linear combination of the p most recent past values plus white noise: Y_t = φ₁Y_ + φ₂Y_ + ... + φₚY_ + ε_t. The impact of shocks decays gradually over time (infinite memory). An MA(q) model expresses the current value as a linear combination of the q most recent random shocks: Y_t = ε_t + θ₁ε_ + ... + θqε_. Shocks have finite impact — after q periods, their effect disappears (finite memory). Most real-world series are better modelled as a combination: ARMA(p,q).
15. How do you check if an ARIMA model is adequate?
Model adequacy is assessed through residual diagnostics: (1) ACF/PACF of residuals should show no significant spikes — if spikes remain, the model order is insufficient; (2) Ljung-Box test should fail to reject the null (p-value > 0.05), confirming no residual autocorrelation; (3) Residuals should be approximately normally distributed (histogram, Q-Q plot); (4) Residuals should have constant variance (no ARCH effects — heteroscedasticity). Information criteria (AIC, BIC) compare competing models. Out-of-sample forecast evaluation (RMSE, MAE on a held-out test set) is the ultimate performance check.
Forecasting Models
16. What is exponential smoothing?
Exponential smoothing is a family of weighted averaging methods where recent observations receive higher weights that decay exponentially for older observations, controlled by a smoothing parameter α (0 < α < 1). Simple Exponential Smoothing (SES) handles series with no trend or seasonality. Holt's Method (Double Exponential Smoothing) adds a trend component (parameters α, β). Holt-Winters (Triple Exponential Smoothing) adds seasonality (parameters α, β, γ) and can be additive or multiplicative. Unlike ARIMA, exponential smoothing models are intuitive, computationally efficient, and often competitive with more complex models.
17. What is the difference between additive and multiplicative seasonality?
Additive seasonality assumes the seasonal effect is constant regardless of the level of the series — the seasonal fluctuation adds a fixed amount. For example, sales always increase by exactly 500 units in December regardless of average monthly sales. Multiplicative seasonality assumes the seasonal effect is proportional to the level — the fluctuation is a percentage of the current level. For example, December sales are always 150% of the annual average. Multiplicative seasonality is more appropriate when variance increases with the level of the series (identifiable as a cone-shaped pattern on the plot). Log transformation converts multiplicative to additive.
18. What is a seasonal decomposition of time series (STL)?
STL (Seasonal and Trend decomposition using Loess) is a robust decomposition method that separates a time series into trend, seasonal, and residual components using locally weighted regression (LOESS). It handles any seasonality period, allows the seasonal component to change over time, and is robust to outliers (unlike classical decomposition). STL is implemented in Python with statsmodels.tsa.seasonal.STL. After STL decomposition, seasonally adjusted data (trend + residual) can be modelled with ARIMA, and the seasonal component added back for forecasting. It is also used for anomaly detection.
19. What is the Prophet forecasting model?
Prophet, developed by Facebook (Meta), is an open-source forecasting model designed for business time series with multiple seasonality periods (daily, weekly, annual), holiday effects, and changepoints in trend. It decomposes the series as: y(t) = trend(t) + seasonality(t) + holidays(t) + error. The trend uses piecewise linear or logistic growth with automatic or specified changepoints. Prophet is robust to missing data, handles non-uniform time intervals, and requires minimal tuning, making it accessible to non-statisticians. It is implemented in Python and R and widely used for business forecasting.
20. What is cross-validation for time series?
Standard k-fold cross-validation randomly shuffles data, which is inappropriate for time series because future data would leak into training. Time series cross-validation (rolling origin, walk-forward validation) instead uses a training window that expands forward over time. For each fold: train on all data up to time t, forecast the next h periods, record the error, then advance t forward. This simulates real-world deployment and produces a realistic estimate of forecast performance. Scikit-learn's TimeSeriesSplit implements this. RMSE, MAE, and MAPE averaged across folds are used to compare models.
21. What are common forecast accuracy metrics?
MAE (Mean Absolute Error) = average of |actual − forecast|, interpretable in original units, robust to outliers. RMSE (Root Mean Squared Error) = square root of the mean squared errors, penalises large errors more heavily, in original units. MAPE (Mean Absolute Percentage Error) = average of |actual − forecast| / actual × 100%, scale-independent and interpretable as a percentage but undefined for zero actuals. sMAPE (Symmetric MAPE) avoids the asymmetry issue of MAPE. MASE (Mean Absolute Scaled Error) scales by a naïve forecast error, enabling comparison across series. No single metric is universally best — RMSE is standard for ML, MAPE for business reporting.
22. What is a naïve forecast and when is it used as a baseline?
A naïve forecast simply uses the last observed value as the prediction for all future periods: ŷ_ = Y_t. A seasonal naïve forecast uses the value from the same period in the prior season: ŷ_ = Y_. Naïve forecasts serve as benchmarks — a model is only useful if it outperforms a naïve baseline. For many financial time series (stock prices, random walks), naïve forecasts are hard to beat. MASE divides model errors by naïve forecast errors — a MASE < 1 means the model beats the naïve baseline.
23. What is the Dickey-Fuller test?
The Augmented Dickey-Fuller (ADF) test tests the null hypothesis that a unit root is present in a time series (i.e., the series is non-stationary). A low p-value (typically < 0.05) rejects the null, indicating stationarity. The test regresses ΔY_t on Y_ and lagged differences; the t-statistic on Y_ coefficient is compared to ADF critical values. The number of lags is selected by AIC/BIC. The test has low power (may fail to reject for near-unit-root processes) and is sensitive to structural breaks. The KPSS test (null: stationary) should be used alongside ADF for confirmation.
24. What is the Box-Jenkins methodology?
Box-Jenkins is a systematic three-stage process for fitting ARIMA models: (1) Identification — plot the series, check for stationarity (ADF test), apply differencing if needed, plot ACF/PACF of the stationary series to determine candidate p and q orders; (2) Estimation — estimate parameters of candidate models using maximum likelihood; select the best model by AIC/BIC; (3) Diagnostic checking — verify residuals are white noise (ACF test, Ljung-Box test). If diagnostics fail, return to step 1. The cycle is repeated until a satisfactory model is found. It remains the gold standard framework for classical time series modelling despite newer automated approaches.
25. What is a Vector Autoregression (VAR) model?
A VAR model extends univariate autoregression to multiple time series, modelling each variable as a linear combination of lagged values of all variables in the system. VAR(p) has p lags for each equation. It captures the mutual interdependence between multiple time series simultaneously — for example, modelling GDP growth and inflation together, where each affects the other. VAR is estimated by OLS for each equation separately. Granger causality tests within a VAR framework test whether one variable helps predict another. Impulse response functions trace the effect of a shock in one variable on all others over time.
Seasonality & Advanced Topics
26. How do you detect and remove seasonality?
Seasonality is detected by visual inspection (clear repeating patterns), ACF plots (significant spikes at seasonal lags), and decomposition plots. Removal methods include: seasonal differencing (subtract the value from the same period last season), classical decomposition (divide or subtract the seasonal component), STL decomposition (subtract the seasonal component), and applying seasonal dummy variables in regression models. Seasonally adjusted series isolate the underlying trend and cycle. Tools like statsmodels.tsa.seasonal_decompose and STL handle decomposition automatically.
27. What is the Fourier series approach to seasonality?
Fourier series model seasonality using sine and cosine functions of different frequencies. Instead of dummy variables for each period (which require many parameters for high-frequency seasonality), a small number of Fourier terms can capture complex seasonal patterns: seasonal_component = Σ[a_k × sin(2πkt/s) + b_k × cos(2πkt/s)] for k=1 to K, where K controls smoothness. This approach works well for multiple seasonality periods (weekly + annual) and is used by Prophet and auto_arima with Fourier terms. Adding Fourier terms as regressors in ARIMA/regression models is efficient and flexible.
28. What is changepoint detection?
Changepoint detection identifies points in a time series where the underlying statistical properties (mean, variance, trend) shift abruptly. Methods include PELT (Pruned Exact Linear Time), BOCPD (Bayesian Online Changepoint Detection), and Prophet's automatic changepoint detection using L1 regularisation. Changepoints correspond to real-world events: product launches, policy changes, crises, or seasonal transitions. Detecting and modelling changepoints prevents prior-period patterns from distorting future forecasts. Python libraries include ruptures for offline detection and changefinder for online detection.
29. What is the difference between in-sample and out-of-sample evaluation?
In-sample evaluation measures how well a model fits the training data it was estimated on — it is optimistically biased because the model's parameters were chosen to minimise in-sample errors. Out-of-sample evaluation measures performance on data the model has never seen (a hold-out test set), providing an unbiased estimate of generalisation ability. For time series, the test set must be the most recent data (not random). A good model has a small gap between in-sample and out-of-sample errors. In-sample metrics only (e.g., R²) can be highly misleading if the model is overfitted.
30. What is a prediction interval vs. a confidence interval in forecasting?
A confidence interval quantifies uncertainty about an estimated parameter (e.g., the mean of a distribution). A prediction interval quantifies uncertainty about a single future observation, which is always wider than a confidence interval because it accounts for both parameter uncertainty and the inherent randomness of the future observation (ε). For example, an ARIMA model might predict 100 units ± 15 (90% prediction interval) for next month's sales, meaning there is a 90% chance the actual value falls between 85 and 115. Prediction intervals are what practitioners should communicate and act on for planning and risk management.
31. What is Granger causality?
Granger causality tests whether one time series X "Granger-causes" another series Y — meaning that past values of X significantly improve the prediction of Y beyond what past values of Y alone can explain. It is estimated within a VAR framework by comparing a restricted model (Y regressed on own lags only) to an unrestricted model (Y regressed on own lags and lags of X), using an F-test. Importantly, Granger causality is about predictive ability, not true causality — X Granger-causes Y means X contains information useful for predicting Y, not necessarily that X causes Y in a causal mechanism sense.
32. What is the GARCH model and when is it used?
GARCH (Generalised Autoregressive Conditional Heteroscedasticity) models time-varying volatility in financial time series. Financial returns often exhibit volatility clustering — large price movements tend to be followed by large movements (of either sign). GARCH(p,q) models the conditional variance as a function of past squared residuals (ARCH terms) and past conditional variances (GARCH terms): σ²_t = ω + α₁ε²_ + β₁σ²_. GARCH is used for options pricing (estimating implied volatility), value-at-risk calculations, and risk management in quantitative finance. GARCH models are fitted after the mean equation (ARIMA) has been estimated.
33. What is hierarchical time series forecasting?
Hierarchical time series arises when data can be naturally aggregated across a hierarchy — for example, total company sales → regional sales → product sales. Bottom-up approaches forecast each bottom-level series and aggregate. Top-down approaches distribute total forecasts to lower levels. Middle-out combines both. Optimal reconciliation (MinT method) produces forecasts at all levels that are both coherent (bottom-level forecasts sum to top-level) and statistically optimal by minimising the trace of the forecast error covariance matrix. The hts package in R and hierarchical forecasting frameworks implement reconciliation.
34. What is transfer function modelling?
Transfer function models (also called dynamic regression or ARIMAX) extend ARIMA by including exogenous predictor variables as additional inputs. For example, forecasting electricity demand using both past demand (ARIMA terms) and temperature (exogenous variable). The exogenous variable may have immediate or lagged effects on the target. ARIMAX with Fourier terms for seasonality and exogenous regressors is a powerful flexible model. In Python, statsmodels.tsa.statespace.SARIMAX handles exogenous variables. Prophet supports regressors via add_regressor(). Feature engineering for time series (lags, rolling means of predictors) is critical for regression-based models.
35. What is the difference between interpolation and forecasting?
Interpolation estimates values within the range of known data points — filling in missing values between observed data points using techniques like linear interpolation, spline interpolation, or nearest-neighbour. Forecasting (extrapolation) estimates values beyond the known data range, projecting future values based on observed patterns. Interpolation is generally more reliable since the estimated values are surrounded by known data. Forecasting uncertainty grows with the horizon because errors compound and assumptions about future patterns may not hold. Both are used in time series — interpolation for handling missing data, forecasting for prediction.
Deep Learning for Time Series
36. What is LSTM and why is it used for time series?
LSTM (Long Short-Term Memory) is a type of recurrent neural network (RNN) designed to learn long-range dependencies in sequential data. Vanilla RNNs suffer from vanishing gradients, making it difficult to learn patterns more than a few time steps back. LSTM addresses this with a memory cell and three gates (input, forget, output) that control information flow: which information to store, update, or discard. For time series, LSTM can capture complex temporal patterns, multiple seasonalities, and nonlinear relationships that ARIMA cannot. However, they require more data, are harder to tune, and less interpretable.
37. What is the difference between one-step and multi-step forecasting?
One-step forecasting predicts only the next single time step (h=1). Multi-step forecasting predicts multiple steps into the future. Strategies for multi-step forecasting include: recursive (use the 1-step model repeatedly, feeding predictions as inputs for the next step — errors compound), direct (train a separate model for each horizon h), and multi-output (train a single model that outputs all h steps simultaneously). Recursive is simplest but accumulates errors. Direct requires many models but avoids error propagation. Seq2Seq neural networks and Temporal Fusion Transformers directly output multi-step forecasts.
38. What is a Temporal Fusion Transformer (TFT)?
The Temporal Fusion Transformer is a deep learning architecture for multi-horizon time series forecasting that combines recurrent layers (LSTMs) with self-attention mechanisms. It handles multiple input types simultaneously: static covariates (fixed attributes like product category), known future inputs (calendar features, planned promotions), and observed past inputs (historical target values, exogenous variables). TFT produces interpretable attention weights that reveal which time steps and features matter most. It achieves state-of-the-art performance on many benchmarks and is available in PyTorch Forecasting.
39. What is the N-BEATS model?
N-BEATS (Neural Basis Expansion Analysis for Time Series) is a pure deep learning model for univariate time series forecasting that uses residual/skip connections and basis function expansion. It achieves strong performance without any time series-specific assumptions and has an interpretable version (N-BEATS-I) that separates the forecast into trend and seasonality components. It does not require covariates or domain knowledge. N-BEATS won the M4 forecasting competition and represents the frontier of pure neural forecasting models. Available in PyTorch Forecasting and Darts.
40. What is the Darts library and what does it offer?
Darts is a Python library for time series forecasting and anomaly detection that provides a unified API for a wide range of models — from classical (ARIMA, Exponential Smoothing, Prophet) to machine learning (XGBoost, LightGBM, Regression models) to deep learning (LSTM, N-BEATS, Temporal Fusion Transformer, TimesNet). Its consistent fit() / predict() interface and native support for multiple time series and covariates simplify model comparison. It also includes backtesting, residual diagnostics, scaling, and anomaly detection tools. Darts is the most comprehensive Python forecasting library as of 2026.
Practical Applications
41. How do you handle missing values in time series?
Missing values in time series cannot be filled with the column mean (ignores temporal structure). Methods include: forward fill (ffill) — carry the last known value forward (appropriate for sparse data like prices); backward fill (bfill); linear interpolation — straight line between surrounding known values; spline interpolation — smooth curve fitting; seasonal interpolation — uses the average of the same period in adjacent cycles; and model-based imputation (fit ARIMA, use interpolated values). For long gaps, multiple imputation or flagging the gap as a feature is better than trying to impute. The pandas .interpolate() method supports many strategies.
42. How do you handle outliers in time series?
Outliers in time series (spikes due to data errors, promotions, or crises) can distort model estimation. Detection methods include STL decomposition (residuals exceeding ±3 SD), IQR method on rolling windows, and isolation forests. Treatment options: replace with interpolated values (if clearly erroneous), retain with an outlier dummy variable in a regression model (so the model learns the outlier's effect), use robust estimation methods (Theil-Sen regression, STL with robustness iterations), or clip to upper/lower bounds. Prophet handles outliers by treating them as extreme changepoints.
43. How do you create time series features for machine learning models?
ML models (XGBoost, LightGBM) require time series features to be engineered explicitly. Features include: lag features (Y_, Y_, Y_), rolling statistics (7-day rolling mean, 30-day rolling std), expanding statistics (cumulative mean), date/time features (day of week, month, quarter, year, is_weekend, is_holiday), target encoding of categorical features, interaction terms between lags, Fourier terms for seasonality, and domain-specific features (weather, promotions, competitor prices). Feature selection is critical to prevent high-lag features from introducing too much training data leakage.
44. What is the difference between the Holt-Winters method and SARIMA?
Holt-Winters (Triple Exponential Smoothing) uses a simple set of smoothing equations to update level, trend, and seasonal components adaptively with each new observation. It is easy to implement, fast, and does not require stationarity testing. SARIMA is a statistical model that explicitly identifies the lag structure through ACF/PACF analysis and estimates parameters via maximum likelihood. SARIMA is generally more rigorous and provides statistical inference (confidence intervals, hypothesis tests) but requires more expertise to specify correctly. For business forecasting, Holt-Winters often performs competitively with SARIMA and is more accessible to non-statisticians.
45. How do you forecast multiple related time series?
Multiple related time series can be forecast independently (series-by-series), which ignores cross-series information, or jointly. Joint approaches include VAR (models interactions between stationary series), global models (fit a single ML or deep learning model across all series, sharing parameters — effective when series have similar patterns), cross-learning (train on many series to improve individual forecasts — demonstrated in the M4 and M5 competitions), and hierarchical reconciliation. Global models with LightGBM or N-BEATS trained across thousands of product-level series typically outperform individual ARIMA models when series are sparse.
46. What is the M-series forecasting competition and why does it matter?
The M-series competitions (M1 through M6) run by Spyros Makridakis are the most important empirical benchmarks in forecasting. Key findings: simple methods (Theta, exponential smoothing) often outperform complex statistical and ML methods for short-horizon forecasting; combination/ensemble methods consistently outperform individual models; the M4 competition (100,000 series) showed ES-RNN and N-BEATS winning with neural methods; the M5 competition (Walmart sales data) showed gradient boosting with rich features winning; the M6 competition introduced financial time series with realistic portfolio constraints. Results guide practical forecasting tool selection.
47. How do you forecast with external events (COVID, holidays)?
External events are incorporated as: (1) Dummy variables (0/1) or step functions for one-time events in regression/ARIMAX models; (2) Prophet's holiday effects via add_country_holidays() or custom holiday dataframes; (3) Regressors (weather, price, marketing spend) in ARIMAX or ML models; (4) Changepoint detection to automatically identify when events caused structural breaks; (5) Pre/post event comparison using counterfactual models (CausalImpact by Google). For COVID, most practitioners used holdout strategies — not using 2020 data for training post-pandemic models or applying adjustment factors based on pre-COVID seasonal patterns.
48. What is anomaly detection in time series?
Time series anomaly detection identifies observations that deviate significantly from expected patterns. Methods include statistical approaches (Z-score, IQR, Grubbs test), model-based approaches (fit ARIMA, flag points where residuals exceed thresholds), isolation forest, LSTM autoencoder (high reconstruction error = anomaly), and STL decomposition (flag high residuals). Anomalies may be point anomalies (single outlier), contextual anomalies (normal in one context, abnormal in another — e.g., high traffic at 3 AM), or collective anomalies (a subsequence is anomalous). Anomaly detection is used in fraud detection, equipment failure prediction, and data quality monitoring.
49. What is nowcasting and how does it differ from forecasting?
Nowcasting estimates the current value of a target variable using related real-time data, for scenarios where the target is measured with a lag. For example, GDP is published quarterly with a 3-month lag — economists use high-frequency indicators (job postings, credit card transactions, electricity consumption) to estimate current-quarter GDP in real time. Nowcasting uses mixed-frequency models (MIDAS — Mixed-Data Sampling), bridge equations, and Kalman filter-based models. It is widely used in central banking, supply chain management, and healthcare surveillance. The key challenge is incorporating new information as it arrives at irregular intervals.
50. How do you choose between ARIMA, Prophet, and an ML model for a forecasting problem?
The choice depends on the problem's characteristics. Use ARIMA/SARIMA when the series is well-behaved with clear AR/MA structure, you have limited data, you need statistical inference (confidence intervals), and the series has a single seasonality. Use Prophet when the series has multiple seasonality periods, holiday effects, irregular missing data, or when non-statisticians need to adjust forecasts. Use ML models (XGBoost, LightGBM) when you have many related series, rich external features (promotions, weather, price), large training data, and need to capture complex nonlinear relationships. Ensemble combinations of approaches consistently outperform any single model in practice.