首页> 外文期刊>International Journal on Communications Antenna and Propagation >Effect of Missing Data Treatment on the Predictive Accuracy of C4.5 Classifier
【24h】

Effect of Missing Data Treatment on the Predictive Accuracy of C4.5 Classifier

机译:缺少数据处理对C4.5分类器预测准确度的影响

获取原文
获取原文并翻译 | 示例
           

摘要

Missing data is a common problem confronted by researchers in machine learning applications. Missing values affect both the performance of analysis tools, as well as the quality of the drawn decisions. This research aims to analyze the impact of four missing data treatment methods on the predictive accuracy of the C4.5 decision tree algorithm. It also investigates the imputation accuracy of each imputation method using a single dataset with missing values presented in a single variable. The work was performed under three missing data assumptions, namely, Missing Completely At Random (MCAR), Missing At Random (MAR) and Missing Not At Random (MNAR) with three missingnesd' rates: 5%, 10%, and 15%. The methods used to treat the missing data are: lite-wise deletion, mean/mode imputation, K-nearest neighbor imputation, and decision tree imputation. The results of the experiments showed that the C4.5 classifier achieved better performance under the MCAR assumption. While the mean/mode imputation has the highest C4.5 predictive accuracy under MAR and MNAR assumptions. The k-nearest neighbor method obtained the most accurate imputation result under the MCAR assumption, whereas mean/mode imputation was the most accurate method under the MAR assumption. On the other hand, the lowest imputation accuracy levels were achieved under the MNAR assumption attributed to the mean/mode imputation method.
机译:缺少数据是机器学习应用程序中的研究人员面对的常见问题。缺失值会影响分析工具的性能,以及绘制决策的质量。本研究旨在分析四种缺失数据处理方法对C4.5决策树算法的预测准确性的影响。它还使用单个数据集来调查每个撤销方法的归纳准确性,其中单个数据集具有缺失的单个变量。这项工作是在三个缺失的数据假设下进行的,即完全在随机(MCAR)中缺失,随机(MAR)缺失,而不是随机(MNAR),其中三个Missnesd率:5%,10%和15%。用于治疗缺失数据的方法是:Lite-Wise删除,均值/模式归档,K最近邻居归纳和决策树归档。实验结果表明,C4.5分类器在MCAR假设下实现了更好的性能。虽然平均/模式估算在MAR和MNAR假设下具有最高的C4.5预测准确性。 K最近邻方法在MCAR假设下获得了最准确的估算结果,而平均/模式估算是MAR假设下最准确的方法。另一方面,在归因于平均/模式载体方法的MNAR假设下实现了最低估计精度水平。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号