首页> 外文期刊>Hydrology and Earth System Sciences Discussions >Systematic comparison of five machine-learning models in classification and interpolation of soil particle size fractions using different transformed data
【24h】

Systematic comparison of five machine-learning models in classification and interpolation of soil particle size fractions using different transformed data

机译:不同转化数据的土壤粒度分数分类和插值中的五种机器学习模型的系统比较

获取原文
           

摘要

Soil texture and soil particle size fractions (PSFs) play an increasing role in physical, chemical, and hydrological processes. Many previous studies have used machine-learning and log-ratio transformation methods for soil texture classification and soil PSF interpolation to improve the prediction accuracy. However, few reports have systematically compared their performance with respect to both classification and interpolation. Here, five machine-learning models – K-nearest neighbour (KNN), multilayer perceptron neural network (MLP), random forest (RF), support vector machines (SVM), and extreme gradient boosting (XGB) – combined with the original data and three log-ratio transformation methods – additive log ratio (ALR), centred log ratio (CLR), and isometric log ratio (ILR) – were applied to evaluate soil texture and PSFs using both raw and log-ratio-transformed data from 640 soil samples in the Heihe River basin (HRB) in China. The results demonstrated that the log-ratio transformations decreased the skewness of soil PSF data. For soil texture classification, RF and XGB showed better performance with a higher overall accuracy and kappa coefficient. They were also recommended to evaluate the classification capacity of imbalanced data according to the area under the precision–recall curve (AUPRC). For soil PSF interpolation, RF delivered the best performance among five machine-learning models with the lowest root-mean-square error (RMSE; sand had a RMSE of 15.09%, silt was 13.86%, and clay was 6.31%), mean absolute error (MAE; sand had a MAD of 10.65%, silt was 9.99%, and clay was 5.00%), Aitchison distance (AD; 0.84), and standardized residual sum of squares (STRESS; 0.61), and the highest Spearman rank correlation coefficient (RCC; sand was 0.69, silt was 0.67, and clay was 0.69). STRESS was improved by using log-ratio methods, especially for CLR and ILR. Prediction maps from both direct and indirect classification were similar in the middle and upper reaches of the HRB. However, indirect classification maps using log-ratio-transformed data provided more detailed information in the lower reaches of the HRB. There was a pronounced improvement of 21.3% in the kappa coefficient when using indirect methods for soil texture classification compared with direct methods. RF was recommended as the best strategy among the five machine-learning models, based on the accuracy evaluation of the soil PSF interpolation and soil texture classification, and ILR was recommended for component-wise machine-learning models without multivariate treatment, considering the constrained nature of compositional data. In addition, XGB was preferred over other models when the trade-off between the accuracy and runtime was considered. Our findings provide a reference for future works with respect to the spatial prediction of soil PSFs and texture using machine-learning models with skewed distributions of soil PSF data over a large area.
机译:土壤纹理和土壤粒度分数(PSF)在物理,化学和水文过程中发挥着越来越大的作用。许多以前的研究使用了用于土壤纹理分类和土壤PSF插值的机器学习和记录比转化方法,以提高预测精度。但是,很少有报道系统地将其表现与分类和插值进行了系统地进行了比较。在这里,五种机器学习模型 - k最近邻(knn),多层erceptron神经网络(MLP),随机森林(RF),支持向量机(SVM)和极端梯度升压(XGB) - 与原始数据相结合和三种记录比转换方法 - 添加到数值(ALR),居中数比(CLR)和等距数量(ILR) - 应用于使用640的原始和记录比转换数据评估土壤纹理和PSF黑河流域土壤样品(HRB)在中国。结果表明,降低比率变换降低了土壤PSF数据的偏差。对于土壤纹理分类,RF和XGB显示出更好的性能,具有更高的整体精度和κ系数。还建议他们根据精密召回曲线(AUPRC)下的区域评估不平衡数据的分类能力。对于土壤PSF插值,RF在具有最低根均方误差的五种机器学习模型中提供了最佳性能(RMSE;沙子的RMSE为15.09%,SILT为13.86%,粘土为6.31%),意味着绝对误差(MAE;沙子的MAD为10.65%,淤泥为9.99%,粘土为5.00%),Aitchison距离(AD; 0.84),以及标准化的剩余平方和(应力; 0.61),以及最高的Spearman等级相关性系数(RCC;沙子为0.69,淤泥为0.67,粘土为0.69)。通过使用对数比方法来改善应力,特别是对于CLR和ILR。来自直接和间接分类的预测地图在HRB的中距离中类似。但是,使用记录比变换数据的间接分类映射在HRB的下游提供了更详细的信息。与直接方法相比,在使用间接方法时,在Kappa系数中发出了21.3%的明显提高。建议RF作为五种机器学习模型中的最佳策略,根据土壤PSF插值和土壤纹理分类的准确性评估,考虑到受限制的性质,建议用于组件 - 明智的机器学习模型,而不是多变量的机器学习模型组成数据。此外,当考虑准确性和运行时之间的权衡时,XGB在其他模型中是优选的。我们的研究结果为未来的工作提供了关于使用大面积的土壤PSF数据的偏斜分布的机器学习模型对土壤PSF和纹理的空间预测的参考。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号