|Development and selection of decision trees for water management: impact of data preprocessing, algorithms and settings|Everaert, G.; Pauwels, I.; Bennetsen, E.; Goethals, P. (2016). Development and selection of decision trees for water management: impact of data preprocessing, algorithms and settings. Ai Communications 29(6): 711-723. https://hdl.handle.net/10.3233/aic-160711
In: Ai Communications. IOS Press: Amsterdam. ISSN 0921-7126; e-ISSN 1875-8452
Classification and regression tree; parameterization; applicability;field data
|Auteurs|| || Top |
- Everaert, G.
- Pauwels, I.
- Bennetsen, E.
- Goethals, P.
In the present research, we found that different preprocessing options and parameterizations of classification and regression trees alter their model fit and have a direct effect on their applicability for end-users. We found that, in terms of applicability, classification trees react different to pruning than regression trees. Indeed, in case of high pruning levels, classification focus on the extreme values of the response variable, whereas regression tree are more likely to predict the intermediate values. Furthermore, when applying cross-validation with a high number of folds, modellers are likely to find one model that outperforms the other models in terms of reliability. Models were assessed based on the determination coefficient, the percentage of Correctly Classified Instances and the Cohen’s Kappa statistic for each parameterization. We found positive correlations (R2>0.70) between the statistical criteria and we found a non-linear negative relation between the model fit and the level of pruning. Therefore, environmental modellers should make use of an exhaustive list of model parameterizations to develop and compare environmental models in a transparent and objective manner. General methodological guidelines derived from the present research may help modellers to efficiently select statistical and ecological relevant models that are meeting the needs of users. The validity of our conclusion should be further tested for other datasets and scientific domains as our findings are based on one set of freshwater data.