Abstract
The fractions skill score (FSS) is a widely used metric for assessing forecast skill, with applications ranging from precipitation to volcanic ash forecasts. By evaluating the fraction of grid squares exceeding a threshold in a neighborhood, the intuition is that it can avoid the pitfalls of pixelwise comparisons and identify length scales at which a forecast has skill. The FSS is typically interpreted relative to a “useful” criterion, where a forecast is considered skillful if its score exceeds a simple reference score. However, the typical reference score used is problematic, since it is not derived in a way that provides obvious meaning, does not scale with neighborhood size, and may not be exceeded by forecasts that have skill. We, therefore, provide a new method to determine forecast skill from the FSS, by deriving an expression for the FSS achieved by a random forecast, which provides a more robust and meaningful reference score to compare with. Through illustrative examples, we show that this new method considerably changes the length scales at which a forecast would be regarded as skillful and reveals subtleties in how the FSS should be interpreted.
Significance Statement
Forecast verification metrics are crucial to assess accuracy and identify where forecasts can be improved. In this work, we investigate a popular verification metric, the fractions skill score, and derive a more robust method to decide if a forecast has sufficiently high skill. This new method significantly improves the quality of insights that can be drawn from this score.