Statistical Assessment of a Numerical Model Simulating Agro Hydro-chemical Processes in Soil under Drip Fertigated Mandarin Tree

The advent of high speed computers, which enable enhanced modelling capabilities and rapid development in mathematical software, has transformed the mathematical evaluation of natural processes. Models are extensively used in almost every field of science for problem solving and decision making, and the vadose zone of agricultural soils is not spared from this revolutionary change. Due to the advancement of micro-irrigation systems such as sprinkler and/or surface/subsurface drip irrigation, which has transformed irrigation and fertilizer practices, there is an increasing interest in evaluating and optimizing these high frequency systems for water and fertilizer use efficiency [1-3].

The R 2 values varied in a narrow range (0.5 to 0.59). Similarly, values for E (0.12-0.43), IA (0.80-0.84), and E 1 (0.26-0.32) and IA 1 (0.61-0.69) suggest that the model precisely predicted water content, salinity and nitrate concentration over the season, however, E rel (-319. 25) and IA rel (-71.3) values were highly negative for nitrate concentration, indicating a mismatch. It was concluded that none of the evaluated measures described and tested the performance of the model for water, salinity and nitrate ideally. Each criterion had its specific advantages and disadvantages, which should be taken into account. Hence, sound model performance evaluation requires the use of a combination of different statistical criteria, which consider both absolute and relative errors. Judicious use of statistical criteria should lead to improvements in the modelling assessment of water, salinity and nitrate dynamics in soil under cropped conditions. Irrigation water was supplied through a surface drip system, with drip lines placed at a distance of 60 cm on both sides of a tree line. The laterals had 1.6 L.h -1 pressure compensating drippers spaced at an interval of 40 cm. Irrigation was performed weekly, and the total seasonal irrigation was 432.8 mm. The salinity of the irrigation water (EC w ) was monitored daily, and ranged between 0.09 and 0.19 dS.m -1 , well below the EC w threshold for irrigation of orange, a close relative of mandarin (1.1 dS.m -1 ). Daily water content measurements were performed using Sentek® EnviroSCAN® capacitance soil water sensors, and soil water was sampled on a weekly basis using SoluSAMPLERs™ [35]. The extracted soil solution was analysed to determine soil solution salinity (EC sw ) and nitrate-nitrogen (NO 3 --N) content.

Modelling technique
The HYDRUS-2D software package was used to simulate the transient two-dimensional movement of water and solutes in the soil [11]. Refer to the HYDRUS technical manual for a detailed description of the governing equations describing variably-saturated flow using the Richards' equation, solute transport using the advection-dispersion equation, and root water uptake, as well as various initial and boundary conditions that can be implemented. In this approach, the drip tubing was considered as a line source, because in this twin line drip irrigation system the wetted patterns from adjacent drippers merge to form a continuous wetted strip along both sides of the tree [36,37]. Modelled observation nodes corresponded to the locations where EnviroSCAN probes (at depths of 10, 25, 50, 80, and 110 cm) and SoluSAMPLERs (at depths of 25, 50, 100, and 150 cm) were installed. Soil hydraulic properties were described using the van Genuchten-Mualem constitutive relationships [38]. The spatial root distribution is defined in HYDRUS-2D according to Vrugt et al. [39]. We considered a simple root distribution model, in which the roots of mandarin trees expanded horizontally into all available space between the tree lines (x m = 200 cm), were concentrated mainly below the drip emitter (x* = 60 cm, z* = 20 cm) where water and nutrients were applied, and extended to a depth of 60 cm (z m = 60 cm).
Reduction of root water uptake due to water stress was described using the piecewise linear relation developed by Feddes et al. [40]. The following parameters in the Feddes et al. model were used: h 1 = −10, h 2 = −25, h 3 = −200 to −1000, h 4 = −8000 cm, which were taken from Taylor and Ashcroft for orange. Reduction of root water uptake due to salinity stress, α 2 (h ϕ ), was described by adopting the Maas and Hoffmann salinity threshold and slope function [40,41]. The salinity threshold (EC T ) for orange (closely related to mandarin) corresponds to a value of the electrical conductivity of the saturation extract (EC e ) of 1.7 dS.m -1 , and a slope (s) of 16%.
The longitudinal dispersivity (ε L ) was considered to be 20 cm, and the transverse dispersivity (ε T ) was taken as one-tenth of ε L , optimised in similar studies involving solute transport in soils [12,20]. Since NH 4 NO 3 and mono-ammonium phosphate were the fertilizers used in our study, nitrification of NH 4 + -N to NO 3 − -N was assumed to be the main N process occurring in the soil. HYDRUS-2D incorporates this process by means of a sequential first-order decay chain.
A time-variable flux boundary condition was applied to a 20 cm long boundary directly below the dripper, centred on 60 cm from the top left corner of the soil domain. During irrigation, the drip line boundary was held at a constant water flux, q. The atmospheric boundary condition was assumed for the remainder of the soil surface during periods of irrigation, and for the entire soil surface during periods between irrigation. A no-flow boundary condition was improvements to the modelling approach through adjustment of model parameter values, model structural modifications, the inclusion of additional observational information, and representation of important spatial and temporal characteristics of the domain; (3) to compare current modelling efforts with previous studies [23]. Field calibration and validation of the model requires conducting tests based on statistical measures, and is the most important aspect of testing the goodness of fit of values generated by the model. This process of assessing the performance of a model requires evaluation of the closeness of the simulated behaviour of the model to field measurements made within the domain.
Exhaustive evaluations and objective analyses have been carried out for models used in various fields of hydrological and hydraulic modelling [23][24][25][26][27][28][29][30]. Accepting the wide recognition and utility of HYDRUS for modelling water and solute movement under irrigation applications, there is a need to assess the performance of HYDRUS for potential sources of deviation using appropriate and simple indicators. Most field evaluation studies using HYDRUS either present graphical comparisons or subjective assessments [15,[31][32][33], and generally considered only limited error and correlation estimates, i.e., an objective assessment [16,18,20,34] of the performance of the model to evaluate water, salt and nitrate movement in soils. These criteria may place emphasis only on a particular behaviour of the model, and may not be able to assess the overall efficacy of the model on a long term basis. Hence, there is a need to evaluate the performance of HYDRUS more vigorously, utilizing different error analyses, test of significance, regression analyses, and efficiency testing to clearly assess the model's sustained performance. It is important to compare the suitability and relative importance of each of these techniques for evaluating modelling predictions of water and solute transport under high efficiency irrigation systems.
In the present investigation, the performance of HYDRUS-2D in simulating water movement, soil solution salinity, and nitrate movement under a mandarin tree during one season was assessed, using eleven statistical measures: mean error (ME), mean absolute error (MAE), root mean square error (RMSE), paired t-test (t cal ), coefficient of determination (R 2 ), model efficiency (E), index of agreement (IA), relative model efficiency (E rel ), relative index of agreement (IA rel ), modified E (E 1 ) and modified IA (IA 1 )). The aim of this comparison was to identify which subset of the statistical measures is most appropriate for evaluating model performance.

Materials and Methods
The statistical tests were employed on the measured and simulated data generated from the field experiment on mandarin and modelling simulations illustrated in our earlier paper Phogat et al. [3]. However, a brief description of experimental details and modelling technique is presented here.

Experimental detail
Modelling evaluation was performed on field experimental data collected at Dareton Agricultural and Advisory Station (34.10ºS and 142.04ºE), located in the Coomealla Irrigation Area, New South Wales, Australia for one season during 2006-2007. The field experiment involved surface drip irrigation of mandarin, established in October 2005. The trees were planted at a spacing of 5 m x 2 m between rows and plants, respectively. The trees were managed and fertilized following current commercial practices. The total yearly rainfall during the experimental period was 187 mm, which was significantly less than annual potential evapotranspiration (1400 mm).  The range of E lies between − ∞ and 1.0 (perfect fit). An efficiency value between 0 and 1 is generally viewed as an acceptable level of performance. Efficiency lower than zero indicates that the mean value of the observed time series would be a better predictor than the model, and denotes unacceptable performance [45].
The index of agreement (IA) was proposed by Willmot (1981), and represents the ratio of the mean square error to the potential error: The value of IA varies between 0 and 1. A value of 1 indicates a perfect agreement between measured and simulated values, and 0 signifies no agreement at all.
Relative efficiency criteria: Various criteria described above (R 2 , E, and IA) quantify the difference between observations and predictions in absolute values. As a result, an over-or under-prediction of larger values has, in general, a greater influence than that of smaller values. To counteract this, efficiency measures based on relative deviations can be derived from E and IA as: where, E rel and IA rel represent the relative efficiency and a relative index of agreement, respectively. These parameters can also range between the values described for E and IA, respectively.

Modified form of E and IA:
The modified form of E and IA are extensively used to overcome the problem of squared differences and oversensitivity to extreme values induced by the mean squared error in E and IA as given below: established at the left and right edges of the soil profile, to account for flow and transport symmetry. A free drainage boundary condition was assumed at the bottom of the soil profile. Initial conditions for water, salinity and nitrate simulations were based on measured data which are described in details in Phogat et al. [3].
HYDRUS-2D requires daily values of potential evaporation (Es) and transpiration (T p ), which were obtained using the dual crop coefficient approach and local meteorological data [42,43].

Statistical indicators
Error estimates: The model's performance was evaluated by comparing measured (M) and HYDRUS-2D simulated (S) values of water content, electrical conductivity of the soil solution (EC sw ), and nitrate concentration (NO 3 --N) in the soil, and calculating a range of error estimates, tests of significance, regression analyses, and dimensionless efficiency tests. The error estimates included mean error (ME), mean absolute error (MAE), and root mean square error (RMSE), given by: The test of significance was conducted using the paired t-test (t cal ) and given as: Here, n and s are the number of comparable paired points and standard deviation respectively; subscripts 1 and 2 are indicative respectively of measured and predicted values; S m is the standard deviation of the mean, and t cal is the calculated paired t-test value.
Efficiency criteria: The coefficient of determination (R 2 ) was applied for testing the proportion of variance in the measured data explained by the model, and is defined as the square of the coefficient of correlation (r) according to Bravais-Pearson, calculated as: Values of R 2 can vary between 0 and 1, with higher values indicating less variance, and values greater than 0.5 typically considered acceptable [24].
Efficiency measures for the evaluation of model performance investigated in this study were: model efficiency (E), index of agreement (IA), relative model efficiency (E rel ), and relative index of agreement (IA rel ).
Model efficiency (E), as proposed by Nash and Sutcliffe [44], is defined as one minus the sum of absolute squared differences between where j represents an arbitrary power i.e. a positive integer (N). Especially when j=1, the errors and differences are given their appropriate weighting, not inflated by their squared values. Hence E 1 and IA 1 represent modified form of efficiency and index of agreement. Squaring in statistics (E 2 and IA 2 ) is useful because squares are easier to manipulate mathematically than are absolute values, but use of squares forces an arbitrarily greater influence on the statistic by way of the larger values [23]. These parameters can also range between the values described for E and IA, respectively.
The error measures (ME, MAE and RMSE) and t-test were computed temporally (across all measurement/simulation depths on weekly basis), spatially (across all weekly measurement/simulation for each depth), and across all individual measurements/simulations for the entire dataset. However, regression and efficiency (R 2 , E, IA, E rel , IA rel , E 1 and IA 1 ) measures were only evaluated on the entire dataset.

Data sets for error parameter comparison
The total data set of weekly measured (M) and HYDRUS-2D simulated (S) water content, soil solution salinity (EC sw ), and nitratenitrogen concentration (NO 3 --N) at different depths and their graphical comparisons are described in Phogat et al. [3]. These data sets provide an ideal basis for comparing the range of error parameters, given that the three data sets (water content, salinity and nitrate concentration) represent a range from good matching between simulated and measured data (water content) to relatively poorly matched (nitrate concentration). This allows comparison of the various error parameters across a range of error values and efficiency testing.

Comparison of error indices
Numerous error estimation methods are in use for comparing simulation results with measured data. These indices are valuable tools because they evaluate the error in the units of the constituent of interest, which helps in the analysis of results and describe the performance and utility of the modelling exercise. Weekly computed temporal error indices (ME, MAE, and RMSE) on measured and simulated water content, soil solution salinity (EC sw ) and nitrate-nitrogen content (NO 3 --N) are depicted in box plots in Figure 1 and range of these parameters are shown in Table 1. Spatial values of error indices are shown in Figure  2. Seasonal values of all statistical measures are displayed in Table 2.
Mean error (ME) is the signed measure of deviations between measured and simulated values, indicating whether the deviations tend to be positive or negative. The ME in temporal data ranged from -0.04 to 0.04 cm 3 .cm -3 , -0.42 to 0.54 dS.m -1 , and -11.31 to 12.38 mg.L -1 for water contents, EC sw , and NO 3 --N concentrations, respectively in Table  1 and Figure 1. Similarly, spatial MEs for water contents, EC sw , and NO 3 --N concentrations ranged from -0.01 to 0.04 cm 3 .cm -3 , -0.24 to 0.34 dSm -1 , and -21.09 to 4.21 mg.L -1 , respectively in Table 1 and Figure  2). Seasonal ME values for water content, EC sw , and NO 3 --N for the entire data set were 0.003 cm 3 .cm -3 , 0.12 dSm -1 , and -6.7 mg.L -1 ( Table  2). This comparison revealed that ME was smallest for water content and greatest for NO 3 --N, with EC sw in between. Although NO 3 --N errors indicated the widest variability, a low value of ME may still conceal simulation inaccuracy due to the offsetting effect of large positive and negative errors, hence the need to also consider MAE and RMSE.
Comparing MAE and RMSE in Figure 2 reveals that the magnitude of the inter-quartile range (IQR) was higher in RMSE as compared to MAE. IQR is the difference between the 25 th and 75 th percentile, and indicates the magnitude of variation in the mid-range of error values. The magnitude of IQR in RMSE was 0.01 cm 3   spatial RMSE values were also higher than the MAE values at all depths shown in Figure 2, but their magnitude was much wider at shallow depths (10-25 cm) where RMSE values were 0.01 cm 3 .cm -3 , 0.2 dS.m -1 , and 9.49 mg.L -1 higher than MAE values for water content, EC sw , and NO 3 --N, respectively. However, higher variations in all types of errors at the surface depth (10-25 cm) reflect the assumption in the model of a constant atmospheric boundary flux during daily time steps, which deviates from actual conditions at the surface boundary, particularly the diurnal fluctuation in evaporation, which peaks in day time and decreases during the night [2].
Comparison of MAE and RMSE further indicated that as the magnitude of variations between measured and predicted values increased, RMSE increased disproportionately, as is evident from the NO 3 --N values. Similar trends for NO 3 --N were also obtained in the MAE and RMSE analysis of the whole data set, where the values of these parameters were 24.49 mg.L -1 and 26.76 mg.L -1 , respectively in Table 2. RMSE was always larger than MAE, and varied with the variability of the error magnitude, because the errors are squared in RMSE before they are averaged. RMSE varies with the variability within the distribution of error magnitudes and with the square root of the number of errors (n 1/2 ), as well as with the average-error magnitude (as MAE) [26]. Hence, RMSE gives a relatively high weight to large errors, as obtained in the case of NO 3 --N. On the other hand, MAE is a linear measure, which means that all individual differences are weighted equally in the average. Hence, MAE may be preferred over RMSE as a more natural measure of the average error, and for an unambiguous assessment of model predictions. Legates and McCabe expressed similar views as RMSE produces inflated values when large outliers are present [23].
The use of these two measures (RMSE and MAE) suffers from a significant drawback, in that they do not indicate the direction of the error. However, this discrepancy may be ignored where the main focus of the comparison is the magnitude of the error rather than its direction.
There is no universally accepted threshold limit for error magnitude when judging the degree of accuracy of model performance. However, Singh et al. stated that RMSE and MAE values smaller than half of the standard deviation of the measured data (hSD m ) may be considered low and appropriate for model evaluation [46]. Therefore, RMSE, MAE, and hSD m values obtained using temporal water content data are compared in Figure 3. It can be seen that these errors (RMSE and MAE) were higher than hSD m except on a few occasions during mid season (DOY 36 to 64) and during the terminal period (DOY 194 onward). Similarly, MAE and RMSE values were higher than hSD m in all analyses of the complete data set shown in Table 2. Hence, model performance was relatively poor in view of this criterion.
In the context of such definitive measures of model performance, it is important to consider the natural variability inherent in the measured data against which the simulations are judged [47]. Real world variability of the natural environment, such as soil variations, as well as measurement inaccuracies can cause measured data to vary relative to the best simulation. For example, in our study EnviroSCAN ® sensors were used to measure the profile water content. The probes were properly calibrated during installation; however, measurements with capacitance probes are highly variable and sensitive to bulk electrical conductivity, temperature, and change in storage estimates [48]. The capacitance sensors used in access tubes may generate consistent errors ≤ 0.05 cm 3 cm -3 , which is similar to the variation observed in our study between EnvironSCAN measured and simulated values.
The error associated with simulated data and outliers in observed data can be further minimised by optimizing the input data by complex weighing techniques and probability distribution based uncertainty analysis techniques like Monte Carlo simulation and Bayesian analysis framework which deals with both random and systemic errors in the simulations [30,49].

Test of significance
The paired t-test was used to evaluate the level of significance between measured and simulated data on water content, EC sw , and NO 3 --N content which is shown in Figure 4. It showed non-significant differences (p = 0.05) between mean values of temporal measured and simulated water content. However, positive t cal values during the early period showed that the measured values were higher than the corresponding simulated values, and the opposite was true later in the season shown in Figure 4a. Similarly, insignificant differences were observed for soil salinity, except at DOY 36 and 43, where differences between measured and simulated mean EC sw were significant. However, significant differences were observed in NO 3 --N content from DOY 78 to 134, which corresponded to a period from March 2007 to early May 2007.
However, t-test showed significant differences in the spatial data set at depths of 10, 25, 80, and 110 cm for water content, 25, 100, and 150 cm for NO 3 --N content and 100 and 150 cm for EC sw (data not shown here). These revelations conform to visual observation of the dataset [3]. Additionally, the t cal values for water content and EC sw for the whole season (Table 1) were non-significant at p = 0.05, whereas the t cal value for NO 3 --N showed significant difference, indicating a relatively poor performance of the model for nitrate simulation. However, t-test represents variation between the mean values of measured and simulated data, and therefore a single seasonal figure may not reflect the degree of spatial and temporal heterogeneity across the season. Moreover, t-test assumes that the measured and simulated values are normally distributed, and that both groups have equal variance. These assumptions may not be perfectly satisfied, and this calls into question the reliability of this statistical measure.
It is also important to understand that hypothesis driven tests, such as paired t-test, should not be relied on solely to measure reliability of  simulations, as the degree of random variation determines the detection of a significant difference; significant systematic bias will be less likely to be detected if it is accompanied by large random errors [50,51].

Regression analysis and efficiency testing
Model performance was also assessed using regression analysis  Table 2).
The R 2 value of 0.5 for water content was just at the margin of the satisfactory level [24]. However, its values for EC sw (0.59) and the NO 3 --N content (0.56) were within the acceptable limit. Hence, R 2 produced relatively similar results across all simulated processes (water content, EC sw , and NO 3 --N) ( Table 2). This is contrary to our results using error tests, where the variability in water content was comparatively smaller than in NO 3 --N content ( Table 2). This reveals a serious drawback in considering R 2 values alone for model performance evaluation, in that it only quantifies dispersion among values. Krause et al. reported that a model which systematically over-or under-predicts at all times will still result in R 2 values close to 1.0, even if all predictions are wrong, which undermines the reliability of R 2 values [25]. Similarly, Legates and McCabe suggested that correlation based measures are inappropriate and should not be used to evaluate the goodness-of-fit of model simulations, as these measures are oversensitive to extreme values and are insensitive to additive and proportional differences between model predictions and observations [23]. Hence, consideration of R 2 alone for model performance assessment sometimes leads to a flawed acceptance of modelling results.  Six efficiency tests were applied to the data set to evaluate their relative performance. The E, E rel , IA rel , E 1 and IA 1 values for water content were 0.43, 0.84, 0.82, 0.20 and 0.59 respectively ( Table 2), suggesting a good match between measured and simulated data, as indicated by previous parameters. On the contrary, IA (0.36) showed relatively poor efficiency of the model for simulating water content distribution. However, values of modified efficiency(E 1 ) and index of agreement (IA 1 ) were lower than relative estimates because these statistics utilize absolute values rather than squared differences in their computation which makes them more conservative measures [23]. The E value (0.43) was within the satisfactory limit reported in other studies [24,45]. Similarly, the E, IA, E rel , IA rel , E 1 and IA 1 values for weekly soil solution salinity (EC sw ) data were 0.30, 0.49, 0.85, 0.80, 0.32 and 0.69, respectively, which are well within acceptable limits, and match previous indicators relatively well. Relative efficiency (E rel ) for water and relative index of agreement (IA rel ) for water and salinity were relatively close to their precursors, E and IA, respectively (Table 2).
Conversely, the nitrate simulation provides a more complicated picture. The Nash and Sutcliffe efficiency (E = 0.12) and index of agreement (IA = 0.80) values are within acceptable limits, and in fact IA is quite high, in contrast to previous error parameters for nitrate concentration. The modified estimates (E 1 and IA 1 ) fall within acceptable range. However, large negative values of E rel (-319.25) and IA rel (-71.3) reflect the wide divergence between measured and simulated values at certain times during the simulation. However, relative deviations reduce the influence of absolute differences among the measures and simulated values.
It is significant that Nash-Sutcliffe efficiency (E), the most frequently used indicator in hydrologic studies, is much more sensitive to errors in higher values, as the differences between measured and simulated values are squared. As a result larger values in a data series have a much higher weighting, whereas lower values are neglected [23]. This comparison suggests that E and/or IA may not always be suitable parameters for describing model performance. Additionally, the large negative values of E rel and IA rel for nitrate showed disproportionately high under-prediction, as reported in Krause et al. [25]. Hence, these parameters proved to be sensitive only to large variations in values and not at all to small divergences because, due to the summation of the absolute or squared errors in efficiency testing methods, emphasis is placed on larger errors while smaller errors tend to be neglected. Hence modified efficiency (E 1 ) and index of agreement (IA 1 ) could be the more appropriate measures for model's efficiency testing than their precursors (E and IA) and relative statistics (E rel and IA rel ). Nevertheless IA 1 has advantages due to its bounds between 0.0 and 1.0 [23]. But good modelling efficiency shown by E 1 and IA 1 statistics is contradictory to the poor modelling simulation for NO 3 --N revealed in error estimates. Additionally, tests of significance and efficiency measures, similar to t-test, evaluate the mean variability in the domain and are unable to capture the modelling divergence at a particular point.
Overall, it can be stated that none of the efficiency parameters which were evaluated in this study adequately described and tested the reliability of model predictions. Each has specific pros and cons, which have to be taken into account during model calibration and validation. Hence, for a sound model performance evaluation, a combination of different statistical efficiency criteria, complemented by the assessment of the absolute or relative error, may need to be included.

Conclusion
Simulation models have been increasingly used in high efficiency drip irrigation systems to evaluate the water and solute dynamics under cropped conditions, and suggest necessary management options to optimise the system efficiency. In this study, eleven statistical measures were used to compare HYDRUS-2D simulated values of water content, soil solution salinity (EC sw ), and nitrate-nitrogen (NO 3 --N) dynamics with field measured values obtained under drip irrigated mandarin crop over a season. as the statistical parameters compared were mean error (ME), mean absolute error (MAE), root mean square error (RMSE), paired t-test (t cal ), coefficient of determination (R 2 ), model efficiency (E), index of agreement (IA), relative model efficiency (E rel ), relative index of agreement (IA rel ), modified E (E 1 ), and modified IA (IA 1 ). The purpose of applying all of these parameters to the same data sets was to evaluate the relative importance of these parameters in model performance testing.
The error parameters (ME, MAE and RMSE) remained within acceptable limits when applied to measured and simulated values of water content and EC sw , whilst a wider range of values of MAE (1.44 to 27.65 mg. L-1 ) and RMSE (2.00 to 39.57 mg. L-1 ) obtained for nitrate (NO 3 --N) indicated poor agreement between simulated and measured values for this data set. Low ME values may conceal simulation inaccuracy due to the offsetting effect of large positive and negative errors. The results revealed that RMSE values were consistently higher than MAE due to squaring of the difference between measured and simulated values. Hence, it was concluded that among the error tests, MAE may be preferred over ME and RMSE for evaluating goodness-of-fit of the simulated values.
Paired t-test values revealed a non-significant difference (p = 0.05) between weekly measured and predicted water content and EC sw distributions. However, differences were significant for the NO 3 --N distribution during the mid-season and at several spatial depths. Similarly, regression analysis (R 2 ) and efficiency testing methods (E, IA, E rel , IA rel , E 1 and IA 1 ) also indicated that the model accurately predicted seasonal changes in water and salinity distributions in the soil. However, negative values of E rel (-319.25) and IA rel (-71.3) for NO 3 --N reflected the relatively poor prediction of NO 3 --N dynamics in the soil. However, relative deviations reduce the influence of absolute differences among the measures and simulated values.
It was concluded that, for reliable model performance evaluation, a combination of different statistical efficiency criteria, along with the assessment of the absolute or relative volume error, must be included. Taken together, these comparisons were able to provide an objective assessment of the closeness of the simulated behaviour to the observed measurements of water, salinity and nitrate distribution in the soil under mandarin. It is expected that such studies would help in improving the performance evaluation and reliability of modelling data on irrigation and fertigation programme of horticultural crops, and contribute to improving system efficiency, and reducing environmentally harmful agro-hydrological practices.