首页> 外文会议>International Conference on Modelling, Identification and Control >Data Preprocessing and Classification for Taproot Site Data Sets of PANAX NOTOGINSENG
【24h】

Data Preprocessing and Classification for Taproot Site Data Sets of PANAX NOTOGINSENG

机译:Panax Notoginseng的Taproot网站数据集的数据预处理和分类

获取原文

摘要

The herbs from different producing regions have differences in the active constituents and efficacy. The quality of the herb from the authentic region is better than other producing regions. Nowadays, many peddlers substitute nonauthentic herbs for authentic-region herbs in order to make more money. So it is important to distinguish herbs between different producing regions. This paper studies the data preprocessing and classification of taproot site data sets of Panax notoginseng from three different producing regions. Compare the effect of data preprocessing includes data standardization, instance selection, attribute selection and try to find out the best method and parameter settings for the data sets. Finally, we use different classification algorithms to classify the preprocessed data and compare the classification performance to find the optimal classification algorithm for the data sets. The classification performance in the experiment was evaluated by Percent Correct (PC), Mean Squared Error (MSE), Kappa Statistics (KS), Area Under ROC (AUR), Mean Absolute Error (MAE). The results shows that using decimal scaling to standardize the data and choose the subset of attribute {1,2,4,6,7,8}is suitable for the data and Random Forest algorithm and AdaBoost.M1 algorithm are the optimal classification algorithm for this data sets which has better classification performance.
机译:来自不同产物区域的草药具有活性成分和功效的差异。来自真实区域的草本植物的质量优于其他生产区域。如今,许多小贩替代非圣地药草以进行真正的地区草药,以便更多金钱。因此,将草药区分开在不同的生产区域中非常重要。本文研究了来自三个不同生产区域的Panax Notoginseng的Taproot网站数据集的数据预处理和分类。比较数据预处理的效果包括数据标准化,实例选择,属性选择,并尝试找出数据集的最佳方法和参数设置。最后,我们使用不同的分类算法来对预处理数据进行分类,并比较分类性能以找到数据集的最佳分类算法。实验中的分类性能按正确(PC),均值平均误差(MSE),Kappa统计(KS),ROC(AUR)区域,平均值误差(MAE)。结果表明,使用十进制缩放来标准化数据并选择属性{1,2,4,6,7,8}的子集适用于数据和随机林算法,Adaboost.m1算法是最佳分类算法此数据集具有更好的分类性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号