Benchmarks

Five benchmark data sets are provided to illustrate the strengths of different statistical error measures.

Users are invited to develop their own statistics to provide more meaningful analysis of data.

The five models are:

  • Model A - Naive - in which the model simply predicts the observed data from six previous time steps.
  • Model B - Low Flow - models the low flow events accurately but is unable to predict high flow events.
  • Model C - Noisy - models the observed flow reasonably well overall but consistently over- or under-predicts the observed flow by 15%.
  • Model D - High Flow - models the high flow events accurately but is unable to model low flows.
  • Model E - Offset - artificially generated data set in which the modelled data are simply offset from the observed data by 180 time units.

The first four benchmark data sets are available within a single text file (tab delimited) that can be downloaded by right clicking here. Model E data are available here.

The HydroTest site calculated the following error measures for these data (the 'best' result for each statistic for the first four models is highlighted in bold).:

Statistic Model A Model B Model C Model D Model E
AME 176  211 75.45 75.25 1000.00
PDIFF 0 202 -74.3 0 0
MAE 28.5188 27.5688 42.8728 45.4266 636.6036
ME -3.4938 27.5688 -0.0534 -45.4266 0.0000
RMSE 42.2550 67.4775 44.5832 52.5474 707.1068
R4MS4E 70.8016 111.6074 48.6038 57.5843 782.5423
NSC 9 0 159 0 5
RAE 0.4851 0.4690 0.7293 0.7728 2.0000
PEP 0 40.1590 -14.7714 0 0
MARE 0.0947 0.0612 0.1500 0.1837 119.0621
MdAPE 8.2802  0.0000 15.0000 25.0000 96.8909
MRE -0.0230 0.0612 0.0000 -0.1837 -118.3333
MSRE 0.0152 0.0208 0.0225 0.0447 1039290
RVE -0.0122 0.0965 -0.0002 -0.1589 0
RSqr 0.7461 0.5604 0.7704 0.9240 1.0000
CE 0.7314 0.3151 0.7010 0.5847 -3.000
IoA d 0.9276 0.6283 0.9311 0.8852 0
PI -20.2462 -53.1806 -22.652 -31.857 -13142
MSLE 0.0179 0.0326 0.0230 0.0357 8.9433
MSDE 93.6604 57.2075 7,963.63 10.2115 152.1637
IRMSE 4.6245 7.3850 4.8793 5.7510 114.6462
VE 0.9002 0.9035 0.8500 0.8411 -0.2732
KGE 0.8613 0.3302 0.8132 0.7347 -1.0000

Additional Datasets

Four more datasets are provided that encompass different kinds of error - Bias error, Volumetric error, Timimg error and Signal Strength error - as illustratred below. The dataset for all four benchmarks is available here.

Bias Error

Volumetric Error

Timing Error

Signal Strength Error

 

   
      Copyright © 2018 Christian W Dawson