User Guide#
This guide introduces the main workflow for the forecast_evaluation package.
It complements the API reference and follows the same usage patterns shown in the
example notebook.
You can browse the full worked example in the example notebook.
The package is designed around a small number of core tasks:
loading forecast and outturn data
filtering data to a consistent evaluation sample
visualising forecasts, errors, and revisions
computing accuracy, bias, and efficiency diagnostics
comparing models with benchmark forecasts
exploring results in an interactive dashboard
Quick Start#
import forecast_evaluation as fe
import pandas as pd
You can either load the built-in Forecast Evaluation Report data, or create a dataset from your own forecasts and outturns.
Load the built-in dataset#
data = fe.ForecastData(load_fer=True)
This is the quickest way to start exploring the package and reproducing the examples.
Create a dataset from your own data#
forecast_data = fe.ForecastData(
forecasts_data=forecasts_dataframe,
outturns_data=outturns_dataframe,
)
The central object is forecast_evaluation.data.ForecastData. It stores raw
forecasts, outturns, transformed series, and the main evaluation table used by the
analysis and plotting functions.
Data Requirements#
Forecasts and outturns are provided as pandas DataFrames.
Forecasts must include the standard identification columns together with a value:
date, vintage_date, variable, source, frequency, forecast_horizon, value
Outturns use the same structure but do not require a source column.
An example forecast table looks like this:
date vintage_date variable source frequency forecast_horizon value
0 2014-12-31 2015-03-31 gdp BVAR Q -1 100
1 2015-03-31 2015-03-31 gdp BVAR Q 0 101
2 2015-06-30 2015-03-31 gdp BVAR Q 1 102
3 2015-09-30 2015-03-31 gdp BVAR Q 2 103
The package supports different forecast metrics such as levels, pop (period-on-period), and
yoy (year-on-year). When required, transformations between these representations are computed
internally when enough outturn history is available.
Working With ForecastData#
Load data and filter the sample#
The example notebook starts by loading FER data and filtering the variables used in subsequent analysis:
data = fe.ForecastData(load_fer=True)
data.filter(variables=["gdpkp", "cpisa", "aweagg"])
The forecast_evaluation.data.ForecastData.filter() method can restrict the
sample by:
forecast dates via
start_dateandend_datevintage dates via
start_vintageandend_vintagevariables, metrics, sources, and frequencies
a custom filtering function through
custom_filter
For example, you can exclude the COVID period from the analysis using the built-in
covid_filter helper:
data_covid_filtered = data.copy()
data_covid_filtered.filter(custom_filter=fe.covid_filter)
If you need to reset to the original unfiltered data, use:
data.clear_filter()
Inspect the stored tables#
Useful accessors on a ForecastData object are:
data.dffor the main evaluation tabledata.forecastsfor the transformed forecast tabledata.outturnsfor the transformed outturn tabledata.id_columnsfor the identification columns used to distinguish models
The summary() method prints a compact overview of the loaded variables, date
range, vintages, and horizons.
Adding forecasts and labels#
You can extend an existing dataset with additional forecasts.
The notebook demonstrates adding a new label column that is treated as part of the forecast identifier:
sample_forecasts = fe.create_sample_forecasts()
sample_forecasts["extra label"] = "Model family A"
data_example_extra_columns = fe.ForecastData(load_fer=True)
data_example_extra_columns.add_forecasts(
sample_forecasts,
extra_ids=["extra label"],
)
This is useful when you want to separate forecasts by model family, conditioning
assumption, scenario, or other metadata beyond source.
Visualisation Workflow#
The package includes plotting functions for vintages, forecast errors, outturns, revisions, and rolling diagnostics. The examples below mirror the notebook.
Recent forecast errors against their historical distribution#
dates_to_highlight = pd.date_range(
start="2022-01-01",
end="2024-12-31",
freq="QE",
)
fe.plot_forecast_error_density(
data=data,
horizon=4,
variable="cpisa",
metric="yoy",
frequency="Q",
source="mpr",
k=12,
highlight_dates=dates_to_highlight,
)
Vintage plots#
fe.plot_vintage(
data=data,
variable="cpisa",
forecast_source=["mpr", "compass conditional", "bvar conditional"],
frequency="Q",
vintage_date="2020-03-31",
metric="yoy",
)
Hedgehog charts#
fe.plot_hedgehog(
data=data,
variable="cpisa",
forecast_source="mpr",
metric="yoy",
frequency="Q",
k=12,
convert_to_percentage=True,
)
Forecast errors over time#
fe.plot_errors_across_time(
data_covid_filtered,
variable="gdpkp",
metric="yoy",
ma_window=4,
error="raw",
sources=["mpr", "baseline ar(p) model"],
k=12,
horizons=[0, 4],
)
Forecast errors by vintage or horizon#
fe.plot_forecast_errors(
data=data,
variable="cpisa",
metric="yoy",
frequency="Q",
source="mpr",
vintage_date_forecast="2022-03-31",
k=12,
convert_to_percentage=True,
)
fe.plot_forecast_errors_by_horizon(
data=data,
variable="cpisa",
metric="yoy",
frequency="Q",
source="mpr",
k=12,
convert_to_percentage=True,
)
Outturns and revisions#
fe.plot_outturns(
data=data,
variable="gdpkp",
metric="yoy",
frequency="Q",
k=[0, 12],
fill_k=True,
convert_to_percentage=True,
)
fe.plot_outturn_revisions(
data=data,
variable="gdpkp",
metric="yoy",
frequency="Q",
k=[4, 12],
ma_window=4,
fill_k=True,
convert_to_percentage=True,
)
Forecast Evaluation Methods#
Most analytical functions return a
forecast_evaluation.tests.results.TestResult object. These results can be
converted to a DataFrame and, in many cases, plotted directly with .plot().
Accuracy statistics#
Use forecast_evaluation.compute_accuracy_statistics() to calculate statistics
such as RMSE, mean absolute error, root median square error, and observation counts.
accuracy_results = fe.compute_accuracy_statistics(data=data, k=12)
accuracy_results.plot(
variable="cpisa",
metric="yoy",
frequency="Q",
statistic="rmse",
convert_to_percentage=True,
)
Comparing to a benchmark#
You can compare model performance relative to a benchmark using summary functions and companion plots.
accuracy_comparison = fe.compare_to_benchmark(
df=accuracy_results,
benchmark_model="baseline ar(p) model",
statistic="rmse",
)
fe.plot_compare_to_benchmark(
df=accuracy_results,
variable="cpisa",
metric="yoy",
frequency="Q",
benchmark_model="baseline ar(p) model",
statistic="rmse",
)
To create a compact table for selected horizons:
comparison_table = fe.create_comparison_table(
df=accuracy_results.to_df(),
variable="cpisa",
metric="yoy",
frequency="Q",
benchmark_model="baseline ar(p) model",
statistic="rmse",
horizons=[0, 1, 2, 4, 8, 12],
)
Relative accuracy tests#
The package includes Diebold-Mariano testing and rolling-window extensions.
diebold_mariano_results = fe.diebold_mariano_table(
data=data,
benchmark_model="mpr",
)
For rolling analysis, first create a focused dataset and then pass the test function
to forecast_evaluation.rolling_analysis().
forecast_data_dm_rolling = data.copy()
forecast_data_dm_rolling.filter(
variables=["gdpkp"],
metrics=["yoy"],
sources=["mpr", "baseline random walk model"],
)
rolling_dm = fe.rolling_analysis(
data=forecast_data_dm_rolling,
window_size=40,
analysis_func=fe.diebold_mariano_table,
analysis_args={"benchmark_model": "mpr"},
)
rolling_dm.plot(variable="gdpkp", horizons=[0, 4])
Fluctuation tests#
Fluctuation tests provide a multiple-window diagnostic that is robust to repeated rolling analysis.
rolling_dm_fluctuation = fe.fluctuation_tests(
data=forecast_data_dm_rolling,
window_size=40,
test_func=fe.diebold_mariano_table,
test_args={"benchmark_model": "mpr"},
)
rolling_dm_fluctuation.plot(variable="gdpkp", horizons=[0, 4])
Bias and efficiency analysis#
The notebook also demonstrates the main econometric diagnostics provided by the package.
bias_results = fe.bias_analysis(data=data, source="mpr", k=12, verbose=False)
bias_results.plot(variable="aweagg", source="mpr", metric="yoy", frequency="Q")
rolling_bias = fe.rolling_analysis(
data=data_gdp,
window_size=40,
analysis_func=fe.bias_analysis,
analysis_args={"k": 12},
)
bl_results = fe.blanchard_leigh_horizon_analysis(
data=data,
source="mpr",
outcome_variable="cpisa",
outcome_metric="yoy",
instrument_variable="gdpkp",
instrument_metric="yoy",
)
weak_efficiency_results = fe.weak_efficiency_analysis(
data=data,
source="mpr",
k=12,
verbose=False,
)
Revisions analysis#
Revisions can be analysed directly through dedicated tests and plots.
revisions_correlation_results = fe.revisions_errors_correlation_analysis(
data=data,
source="mpr",
k=12,
)
revisions_predictable_results = fe.revision_predictability_analysis(
data=data,
frequency="Q",
n_revisions=5,
)
fe.plot_average_revision_by_period(
data=data,
source="mpr",
variable="gdpkp",
metric="yoy",
frequency="Q",
)
Adding Benchmark Forecasts#
You can augment a ForecastData object with simple benchmark models using
forecast_evaluation.data.ForecastData.add_benchmarks().
Supported benchmark families are AR (autoregressive) and random_walk.
data.add_benchmarks(metric="pop", models=["AR", "random_walk"])
Optional arguments let you restrict the benchmark generation to selected variables or frequencies, control the number of forecast periods, and supply an estimation start date.
Density Forecasts#
For probabilistic forecasts with quantiles, use
forecast_evaluation.data.DensityForecastData, which extends
ForecastData.
Density forecast input must include a quantile column with values between 0 and 1.
density_df = fe.create_sample_density_forecasts()
density_data = fe.DensityForecastData(forecasts_data=density_df)
You can also add density forecasts to an existing object:
density_data = fe.DensityForecastData()
density_data.add_density_forecasts(density_df)
Density forecast objects retain the standard forecast and outturn workflow while also
exposing a density_forecasts table for quantile-level analysis.
Dashboard#
The package includes an interactive dashboard for exploring forecasts, errors, and analysis outputs.
Run it from a ForecastData object:
data.run_dashboard()
When working inside a notebook, you can embed the dashboard in the notebook output:
data.run_dashboard(from_jupyter=True)
Further Reading#
For a worked example covering the main plotting and testing functions, see the example notebook in the repository. For function signatures and parameter-level details, refer to the API reference.