In the field of forecasting, the term ‘forecast skill’ is any measure of accuracy. We offer different skill metrics that emphasize different characteristics of “accuracy”. If the metric is a measure of the absolute model error, lower values are better. A ‘skill metric’ has a range of 0 to 1, where 0 = totally accurate. A ‘skill score’ is a percentage improvement over a reference forecast (or ‘benchmark’) that measures skill levels between forecasting systems. The skill score has a range of -∞ to +100%. Zero percent means the forecast and reference forecast have equal skill. The higher the skill score, the greater the improvement over the selected benchmark.

Backtesting

Salient models are back-tested over an extended period of 20 to 30 years to verify performance. A forecast has "skill" if it corresponds to what happened. Model skill metrics are calculated by comparing all Salient forecasts to the ERA5 reanalysis datasets.

To effectively evaluate the performance of probabilistic forecast models, it is crucial to utilize a long backtesting period. A more extended backtesting period provides a more robust assessment of the model's performance across various scenarios, different seasons, economic cycles, and extreme events. By considering a diverse range of historical conditions, decision-makers can gain a holistic understanding of the model's accuracy and reliability over time.

Metric

Salient offers a suite of skill metrics to assess the forecast error. We offer metrics that assess the error of full probabilistic distributions (CRPS), categorical forecasts (RPS), and even simple deterministic forecast values (MAE).

Skill Scores

Simply looking at the metric does not give a sense of relative skill - error for something like temperature is always low in the tropics compared to the mid-latitudes because the variance is much smaller. Thus, each available skill metric has a corresponding Skill Score, which measures the percent improvement of the Salient forecast model compared to a reference forecast.

$$ For example,  CRPSS = 1 − CRPSforecast / CRPS reference  $$

Skill scores are available per variable, location, season, and lead time combination. This allows you to easily quantify the percentage improvement of Salient over the selected reference for the scope that matters. Utilizing skill scores in this way provides a simple visualization of where Salient outperforms the best available alternatives. The viewer automatically defaults to the 'best available alternative' depending on the timescale and lead selected, however, you can easily change the reference model from the drop down menu if desired.

Modeling Cross-Validation Process

Some users are concerned about backtesting results in our data driven models due to the potential for models to benefit from the inclusion of future observations. Salient's cross-validation scheme strictly holds out data post-2015 as its "test" dataset. Skill statistics and 2015-2023 hindcasts do not contain any post-2015 observations.

Salient employs a rigorous cross-validation (CV) process in our model training and validation process. The k-fold cross-validation scheme provides a robust evaluation of the model's performance by averaging the results across multiple train-test splits. This method helps mitigate the risk of overfitting and ensures a fair assessment of the model's generalization ability on unseen data.

Figure 1: 5-year K-Fold Cross-Validation Scheme

Figure 1: 5-year K-Fold Cross-Validation Scheme

In our approach, shown in Figure 1, the ‘validation’ dataset (1980-2014) is divided into seven equal parts of 5 years each (folds 0-6, yellow dots), and the model is trained and tested in seven separate iterations. In each iteration, all years indicated by blue dots are used for training, and the years highlighted by yellow dots are then forecast for. This process ensures that each part of the validation dataset is forecast for exactly once, and data from the years being forecast for are left out of training. The test data (green dots, 2015 - present) is never utilized for training - it is always kept separate, and only forecast for in the final hindcast - fold 7. In this manner, all hindcasts are left-out predictions. We then evaluate performance metrics over the combined validation and test sets; however we also keep the test set as the strictest test for real-time performance. Fold 8 represents the training data for real-time forecasts only.

/hindcast_summary

The split_set parameter

hindcast_summary has 3 options for split_set which controls the time period over which the skill scores are calculated:

https://api.salientpredictions.com/documentation/api/#/Validation/hindcast_summary