Sensitivity of inferred climate model skill to evaluation decisions: a case study using CMIP5 evapotranspiration

C.R. Schwalm, D.N. Huntzinger, A.M. Michalak, J.B. Fisher, J.S. Kimball, B. Mueller, K. Zhang and Y. Zhang

Observational datasets are often used to evaluate the skill of climate and ecosystem models. Such skill metrics can then be used to assess the relative quality of several models, to rank them, and to assess the likely uncertainty associated which their predictions of other variables. Here we show that the inferred model skill and model ranking are highly sensitive to choices made in the evaluation of model metrics (e.g. reference data product, land mask, etc.), indicating that model skill and rank should be viewed as probability distributions rather than as deterministic quantities, which has implications for model intercomparison studies.

Figure: (a) Distribution of rank of CMIP 5 models resulting from the use different choices of reference product, land mask, time period, regridding algorithm and spatial resolution. Note that the rank of models is highly uncertainty because it is sensitive to how the rank is evaluated. Black squares indicate median, blue diamonds represent interquartile range, and red circles represent 2.5th and 97.5th percentiles. (b) Ranking of the impact of various types of benchmarking choices on inferred model skill for several metrics.


Confrontation of climate models with observationally-based reference datasets is widespread and integral to model development. These comparisons yield skill metrics quantifying the mismatch between simulated and reference values and also involve analyst choices, or meta-parameters, in structuring the analysis. Here, we systematically vary five such meta-parameters (reference dataset, spatial resolution, regridding approach, land mask, and time period) in evaluating evapotranspiration (ET) from eight CMIP5 models in a factorial design that yields 68 700 intercomparisons. The results show that while model–data comparisons can provide some feedback on overall model performance, model ranks are ambiguous and inferred model skill and rank are highly sensitive to the choice of meta-parameters for all models. This suggests that model skill and rank are best represented probabilistically rather than as scalar values. For this case study, the choice of reference dataset is found to have a dominant influence on inferred model skill, even larger than the choice of model itself. This is primarily due to large differences between reference datasets, indicating that further work in developing a community-accepted standard ET reference dataset is crucial in order to decrease ambiguity in model skill.

Schwalm, C.R., D.N. Huntinzger, A.M. Michalak, J.B. Fisher, J.S. Kimball, B. Mueller, K. Zhang, Y. Zhang (2013) "Sensitivity of inferred climate model skill to evaluation decisions: a case study using CMIP5 evapotranspiration",Environmental Research Letters, 8, 024028, doi:10.1088/1748-9326/8/2/024028.