Validation Overview

In the field of forecasting, the term ‘forecast skill’ is any measure of accuracy. We offer different skill metrics that emphasize different characteristics of “accuracy”. If the metric is a measure of the absolute model error, lower values are better. A ‘skill metric’ has a range of 0 to 1, where 0 = totally accurate. A ‘skill score’ is a percentage improvement over a reference forecast (or ‘benchmark’) that measures skill levels between forecasting systems. The skill score has a range of -∞ to +100%. Zero percent means the forecast and reference forecast have equal skill. The higher the skill score, the greater the improvement over the selected benchmark.

Backtesting

Salient models are back-tested over an extended period of 20 to 30 years to verify performance. A forecast has "skill" if it corresponds to what happened. Model skill metrics are calculated by comparing all Salient forecasts to the ERA5 reanalysis datasets.

To effectively evaluate the performance of probabilistic forecast models, it is crucial to utilize a long backtesting period. A more extended backtesting period provides a more robust assessment of the model's performance across various scenarios, different seasons, economic cycles, and extreme events. By considering a diverse range of historical conditions, decision-makers can gain a holistic understanding of the model's accuracy and reliability over time.

Weekly and monthly hindcasts are available from 2000
Quarterly hindcasts are available from 1981

Metric

Salient offers a suite of skill metrics to assess the forecast error. We offer metrics that assess the error of full probabilistic distributions (CRPS), categorical forecasts (RPS), and even simple deterministic forecast values (MAE).

Continuous ranked probability score (CRPS): CRPS is a scoring rule used to assess the accuracy of probabilistic forecasts of continuous variables. CRPS captures both the accuracy and level of certainty (narrower predicted range) as shown in the figure below. It measures the difference between the cumulative distribution function of the predicted probabilities and the cumulative distribution function of the observed outcomes, rewarding forecasts that are accurate (how close the prediction is to the observed historical value) and precise or sharp (how close the predicted values are to each other). A lower CRPS indicates better calibration and accuracy of the probabilistic forecasts.
Ranked Probability Score (RPS): RPS is a scoring rule used to assess the quality of categorical probabilistic forecasts. It measures the accuracy of predicted probabilities for different categories in comparison to observed outcomes. RPS takes into account the ranking of predicted probabilities and assigns a score based on how well the forecasted probabilities match the observed outcomes. A lower RPS indicates better calibration and accuracy of the probabilistic forecasts.
Mean Absolute Error (MAE): MAE simply quantifies the average magnitude of errors between predicted values and actual values, with lower values indicating less error.

Skill Scores

Simply looking at the metric does not give a sense of relative skill - error for something like temperature is always low in the tropics compared to the mid-latitudes because the variance is much smaller. Thus, each available skill metric has a corresponding Skill Score, which measures the percent improvement of the Salient forecast model compared to a reference forecast.

$$ For example, CRPSS = 1 − CRPSforecast / CRPS reference $$

Skill scores are available per variable, location, season, and lead time combination. This allows you to easily quantify the percentage improvement of Salient over the selected reference for the scope that matters. Utilizing skill scores in this way provides a simple visualization of where Salient outperforms the best available alternatives. The viewer automatically defaults to the 'best available alternative' depending on the timescale and lead selected, however, you can easily change the reference model from the drop down menu if desired.

Modeling Cross-Validation Process

Some users are concerned about backtesting results in our data driven models due to the potential for models to benefit from the inclusion of future observations. Salient's cross-validation scheme strictly holds out data post-2015 as its "test" dataset. Skill statistics and 2015-2023 hindcasts do not contain any post-2015 observations.

Salient employs a rigorous cross-validation (CV) process in our model training and validation process. The k-fold cross-validation scheme provides a robust evaluation of the model's performance by averaging the results across multiple train-test splits. This method helps mitigate the risk of overfitting and ensures a fair assessment of the model's generalization ability on unseen data.

Figure 1: 5-year K-Fold Cross-Validation Scheme

In our approach, shown in Figure 1, the ‘validation’ dataset (1980-2014) is divided into seven equal parts of 5 years each (folds 0-6, yellow dots), and the model is trained and tested in seven separate iterations. In each iteration, all years indicated by blue dots are used for training, and the years highlighted by yellow dots are then forecast for. This process ensures that each part of the validation dataset is forecast for exactly once, and data from the years being forecast for are left out of training. The test data (green dots, 2015 - present) is never utilized for training - it is always kept separate, and only forecast for in the final hindcast - fold 7. In this manner, all hindcasts are left-out predictions. We then evaluate performance metrics over the combined validation and test sets; however we also keep the test set as the strictest test for real-time performance. Fold 8 represents the training data for real-time forecasts only.

/hindcast_summary

The split_set parameter

hindcast_summary has 3 options for split_set which controls the time period over which the skill scores are calculated:

all - all available data back to 2000
test - just the “test” out-of-sample data from 2015 that was never a part of model training
validate - k-fold holdout data sets. Currently not available in v8.

https://api.salientpredictions.com/documentation/api/#/Validation/hindcast_summary