...
首页> 外文期刊>Geoderma: An International Journal of Soil Science >Using data mining to model and interpret soil diffuse reflectance spectra
【24h】

Using data mining to model and interpret soil diffuse reflectance spectra

机译:使用数据挖掘对土壤漫反射光谱进行建模和解释

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The aims of this paper are: to compare different data mining algorithms for modelling soil visible-near infrared (vis-NIR: 350-2500 nm) diffuse reflectance spectra and to assess the interpretability of the results. We compared multiple linear regression (MLR), partial least squares regression (PLSR), multivariate adaptive regression splines (MARS), support vector machines (SVM), random forests (RF), boosted trees (BT) and artificial neural networks (ANN) to estimate soil organic carbon (SOC), clay content (CC) and pH measured in water (pH). The comparisons were also performed using a selected set of wavelet coefficients from a discrete wavelet transform (DWT). Feature selection techniques to reduce model complexity and to interpret and evaluate the models were tested. The dataset consists of 1104 samples from Australia. Comparisons were made in terms of the root mean square error (RMSE), the corresponding R-2 and the Akaike Information Criterion (AIC). Ten-fold-leave-group out cross validation was used to optimise and validate the models. Predictions of the three soil properties by SVM using all vis-NIR wavelengths produced the smallest RMSE values, followed by MARS and PLSR. RF and especially BT were out-performed by all other approaches. For all techniques, implementing them on a reduced number of wavelet coefficients, between 72 and 137 coefficients, produced better results. Feature selection (FS) using the variable importance for projection (FSVIP) returned 29-31 selected features, while FSMARS returned between 11 and 14 features. DWT-ANN produced the smallest RMSE of all techniques tested followed by FSVIP-ANN and FSMARS-ANN. However, both the FSVIP-ANN and FSMARS-ANN models used a smaller number of features for the predictions than DWT-ANN. This is reflected in their AIC, which suggests that, when both the accuracy and parsimony of the model are taken into consideration, the best SOC model was the FSMARS-ANN, and the best CC and pH models were those from FSVIP-ANN. Analysis of the selected bands shows that: (i) SOC is related to wavelengths indicating C-O. C = O, and N-H compounds, (ii) CC is related to wavelengths indicating minerals, and (iii) pH is related to wavelengths indicating both minerals and organic material. Thus, the results are sensible and can be used for comparison to other soils. A systematic comparison like the one presented here is important as the nature of the target function has a strong influence on the performance of the different algorithms. Crown Copyright
机译:本文的目的是:比较不同的数据挖掘算法,以对土壤可见-近红外(vis-NIR:350-2500 nm)漫反射光谱建模,并评估结果的可解释性。我们比较了多元线性回归(MLR),偏最小二乘回归(PLSR),多元自适应回归样条(MARS),支持向量机(SVM),随机森林(RF),增强树(BT)和人工神经网络(ANN)估算土壤中的有机碳(SOC),粘土含量(CC)和在水中测量的pH(pH)。还使用来自离散小波变换(DWT)的一组选定的小波系数进行比较。测试了减少模型复杂性以及解释和评估模型的特征选择技术。该数据集包含来自澳大利亚的1104个样本。根据均方根误差(RMSE),相应的R-2和Akaike信息准则(AIC)进行了比较。十折离开分组交叉验证用于优化和验证模型。使用所有vis-NIR波长的SVM预测的三种土壤性质产生的最小RMSE值,其次是MARS和PLSR。射频(尤其是BT)在所有其他方法上均表现不佳。对于所有技术,在数量减少的小波系数(介于72和137个系数之间)上实现它们,会产生更好的结果。使用投影重要性重要性(FSVIP)的特征选择(FS)返回29-31个选定特征,而FSMARS返回11至14个特征。在所有测试技术中,DWT-ANN产生的最小RMSE,其次是FSVIP-ANN和FSMARS-ANN。但是,与DWT-ANN相比,FSVIP-ANN和FSMARS-ANN模型都使用较少的特征进行预测。这反映在他们的AIC中,这表明,当同时考虑模型的准确性和简约性时,最佳SOC模型是FSMARS-ANN,最佳CC和pH模型是FSVIP-ANN中的模型。对选定频带的分析表明:(i)SOC与指示C-O的波长有关。 C = O和N-H化合物,(ii)CC与指示矿物质的波长有关,(iii)pH与指示矿物质和有机物质的波长有关。因此,结果是合理的,可用于与其他土壤进行比较。像此处介绍的系统比较很重要,因为目标函数的性质对不同算法的性能有很大影响。皇冠版权

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号