Details (modelling and statistics)

The modelling:

Single / Multiple analysis

It is possible to analyse one modelled file at a time or multiple modelled files. To analyse one file click on the 'Single' button. To analyse a number of modelled files, enter the appropriate number of files in the 'Number of models to analyse:' field and click the 'Multiple' button. Because this analysis may involve a large number of files, the results are not shown on screen, but are saved in a text file that can be viewed and downloaded. Note - there may be a limit to the size/number of files that can be simultaneously analysed. Hydrotest has been tested with 300 multiple files containing 1,000 data points each.

If you wish to analyse the output of a number of models simultaneously, the data should be held in separate columns (tab or comma delimited) - one column for each model. These can be stored in the same file as the observed data (the observed data must be in the first column) or in a separate modelled data file.

In the summary results of multiple files the following results are also presented:

Missing A: The number of missing values in the file before being compared with the observed data.
<> Range A: Number of values outside the pre-entered bounds before being compared with the observed data.
Missing B: The number of missing values when the data have been analysed against the observed data.
<> Range B: Number of values outside the pre-entered bounds when the data have been analysed against the observed data.
Rel. Miss.: The number of missing values that could not be calculated (due to divide by zero problems) when calculating MSRE.

Missing data value

If your data set(s) contain missing values these should be represented by a missing value code rather than blanks or empty lines. The default value for representing missing data is -999 but you can change this to whatever code number you feel is appropriate.

If, during the analysis, the algorithm comes across one of these missing data points the value will be skipped and will not be included in the analysis. If an observed data point is missing the corresponding modelled data point will also be skipped irrespective of whether it is missing or not (for example, during the calculation of the modelled mean, some valid modelled values may be excluded from the calculation if their corresponding observed data point is missing).

Null data and non-numeric data

Please ensure that your data do not contain any non-numeric or null (empty lines) values - particularly at the end of your data files. Your files should also not include any headers (column labels). Any non-numeric data will be interpreted by the system as zeros and will cause incorrect results. If in doubt - check the 'Total number of data points' analysed in the final results screen - this value should correspond with the number of data points in your files.

POT (Peaks Over Threshold)

If a value is entered in the Threshold (POT) field in step2 of the Analysis process, HydroTest will calculate the Peaks Over Threshold for the observed and modelled data. The default value is zero. Note that values must be strictly greater than the entered value to be counted.

The Statistics:

Statistic
Description

Absolute Maximum Error:

This Metric records in real units the magnitude of the worst possible positive or negative error that the model has produced. It is a non-negative metric that has no upper bound and for a perfect model the result would be zero. It does not attempt to represent in a direct manner the level of overall agreement between the two datasets and individual outliers can have a marked influence or produce a misleading effect. The measure is nevertheless useful in situations where it is important to establish whether or not a particular environment threshold has been exceeded, i.e. maximum permitted error.

Peak Difference:

This metric records in real units how well the highest output value in the modelled dataset matches the highest recorded value in the observed dataset. It is a signed metric that has no upper bound and for a perfect model the result would be zero. As a signed metric this measure indicates whether or not the forecasts are biased, i.e. does a systematic error exist such that the forecasts tend to be either disproportionately positive or negative. The metric is positive if a model over-estimates the overall actual, or negative if a model under-estimates the overall actual values.

Mean Absolute Error:

This metric records in real units the level of overall agreement between the observed and modelled datasets. It is a non-negative metric that has no upper bound and for a perfect model the result would be zero. It provides no information about under-estimation or over-estimation. It is not weighted towards high(er) or low(er) magnitude events, but instead evaluates all deviations from the observed values, in both an equal manner and regardless of sign. MAE is comparable to the total sum of absolute residuals.

Mean Error:

This signed metric records in real units the level of overall agreement between the observed and modelled datasets. It is unbounded and for a perfect model the result would be zero. However, a low score does not necessarily indicate a good model in terms of accurate forecasts, since positive and negative errors will tend to cancel each other out and, for this reason, MAE is often preferred to ME.

Root Mean Squared Error:

This metric records in real units the level of overall agreement between the observed and modelled datasets. It is a non-negative metric that has no upper bound and for a perfect model the result would be zero.

Fourth Root of the Mean Quadrupled Error:

This metric records in real units the level of overall agreement between the observed and modelled datasets. It is a non-negative metric that has no upper bound and for a perfect model the result would be zero.

A Information Criteria:

The Akaike information criterion (AIC) and the Bayesian information criterion (BIC) are model selection metrics in which some traditional evaluation measure is adjusted according to the number of free parameters in each model, p, and the number of data points that were used in its calibration, m.

B Information Criteria:

Number of Sign Changes:

This metric comprises a simple sequential count of the number of instances in which the sign of the residual changes throughout each series. It is unbounded and for a perfect model the result would be zero (although a score of zero does not necessarily imply a perfect model.)

Relative Absolute Error:

This metric comprises the total absolute error made relative to what the total absolute error would have been if the forecast had simply been the mean of the observed values. It is a non-negative metric that has no upper bound and for a perfect model the result would be zero.

Percent Error in Peak:

This metric comprises the difference between the highest value in the modelled dataset and the highest value in the observed dataset, made relative to the magnitude of the highest value in the observed dataset, and expressed as a percentage. It can be either positive or negative. It is unbounded and for a perfect model the result would be zero. Positive values of PEP denote an over-estimate of the peak; negative values denote an under-estimate of the peak.

Mean Absolute Relative Error:

This metric comprises the mean of the absolute error made relative to the observed record. It is a non-negative metric that has no upper bound and for a perfect model the result would be zero.

Median Absolute Percentage Error:

This metric comprises the median of the absolute error made relative to the observed record. It is a non-negative metric that has no upper bound and for a perfect model the result would be zero.

Mean Relative Error:

This metric comprises the mean of the error made relative to the observed record. It is a signed metric that has no upper bound and for a perfect model the result would be zero.

Mean Squared Relative Error:

This metric comprises the mean of the squared relative error in which relative error is error made relative to the observed record. It is a non-negative metric that has no upper bound and for a perfect model the result would be zero.

Relative Volume Error (also known as Deviation of Runoff Volumes, Dv):

This signed metric comprises the total error made relative to the total observed record. It is a unbounded and for a perfect model the result would be zero.

R- squared statistic (Coefficient of Determination; Pearson's r squared):

This metric comprises the squared ratio of the combined dispersion of two series to the total dispersion of the observed and modelled series.

Coefficient of Efficiency (Nash-Sutcliffe efficiency):

This popular metric has several aliases. It is one minus the ratio of sum square error (SSE) to the statistical variance of the observed dataset about the mean of the observed dataset. CE is intended to range from zero to one but negative scores are also permitted. The maximum positive score of one represents a perfect model; a value of zero indicates that the model is no better that a one parameter "no knowledge" model in which the forecast is the mean of the observed series at all time steps; negative scores are unbounded and a negative value indicates that the model is performing worse that a "no knowledge" model.

Index of Agreement measure (d):

This metric is one minus the ratio of sum square error (SSE) to "potential error" in which potential error represents the sum of the "largest quantification" that can be obtained for each individual forecast with respect to the mean of the observed dataset.

Persistence Index:

The use of this metric is on the increase and it is often referred to in published papers under different aliases. It is one minus the ratio of sum square error (SSE) to what sum square error (SSE) would have been if the forecast had been the last observed value. The maximum positive score of one represents a perfect model. Note, this is one-step Persistence Index - it is not appropriate for models that are predicting over a longer range. It is also not appropriate for analysing data sets that are not in sequential time series order (for example, if the data have been sorted into a random order before analysis).

Mean Squared Logarithmic Error:

Similar to RMSE but compares the logged values of observed and modelled data. It is more suitable for low flows than RMSE as it is based on logarithmic transformations (deVos and Rientjes, 2007).

Mean Squared Derivative Error:

According to deVos and Rientjes (2007) the MSDE 'expresses the difference between the first-order derivatives of the simulated and the observed discharge, which is equal to the difference in residuals between two succesive time steps'. It provides a good indicator of fit to the hydrograph shape.

Inertia Root Mean Squared Error:




Used by the Russian Hydrometeorological Center (Appolov et al., 1974;Popov, 1968). Analyses the performance of a model versus the so called inertia forecast. Values of IRMSE below 0.8 are satisfactory, and below 0.7 are regarded as good. Note, this is one-step inertia measure - it is not appropriate for models that are predicting over a greater range. It is not appropriate for analysing data sets that are not in sequential time series order (for example, if the data have been sorted into a random order before analysis).

Volumetric Efficiency:

Represents the fraction of water delivered at the proper time (Criss and Winston, 2008). Theoretically VE can range from minus infinity for the worst case to 1 for a perfect model.

Kling-Gupta efficiency:

r = linear correlation coefficient

KGE = 1 - ED

 

Improvement to Nash-Sutcliffe efficiency proposed by Gupta et al. (2009).

   
      Copyright © 2018 Christian W Dawson