首页> 外文会议>IEEE Annual Ubiquitous Computing, Electronics and Mobile Communication Conference >A Provenance Meta Learning Framework for Missing Data Handling Methods Selection
【24h】

A Provenance Meta Learning Framework for Missing Data Handling Methods Selection

机译:用于缺少数据处理方法选择的原子生元学习框架

获取原文

摘要

Missing data is a big problem in many real-world data sets and applications, which can lead to wrong or misleading results of analyses and lower quality and confidence in the results. A large number of missing data handling methods have been proposed in the research community but there exists no universally single best method which can handle all the missing data problems. To select the right method for a specific missing data handling problem, it usually depends on multiple inter-twined factors. To alleviate this methods selection problem, in this paper, we propose a Provenance Meta Learning Framework to simplify this process. We conducted an extensive literature review over 118 missing data handling method survey papers from 2000 to 2019. With this review, we analyse 9 influential factors and 12 selection criteria for missing data handling methods and further perform a detailed analysis of 6 popular missing data handling methods (4 machine learning methods, i.e., KNN Imputation (KNNI), Weighted KNN Imputation (WKNNI), K Means Imputation (KMI), and Fuzzy KMI (FKMI), and 2 ad-hoc methods, i.e., Median/Mode Imputation (MMI) and Group/Class MMI (CMMI)). We focus on missing data handling methods selection for 3 different classification techniques, i.e., C4.5, KNN, and RIPPER. In our evaluations, we adopt 25 real world data sets from KEEL and UCI data sets repositories. Our Provenance Meta Learning Framework suggests that using KNNI to handle missing values when missing data mechanism is Missing Complete At Random (MCAR), missing data pattern is uni-attribute missing data pattern, or monotone missing data pattern, missing data rate is within [1%,5%], number of class labels is 2, sample size is no more than 10′000, since it can keep classification performance better and have higher imputation accuracy and imputation exhaustiveness than all the other 5 missing data handling methods when subsequent classification methods are KNN or RIPPER.
机译:缺少数据是许多真实数据集和应用中的一个大问题,这可能导致对分析的错误或误导性结果以及对结果的较低质量和置信度。在研究界中提出了大量缺失的数据处理方法,但没有普遍的单一最佳方法,可以处理所有缺失的数据问题。要为特定缺失的数据处理问题选择合适的方法,通常取决于多个间间的因素。为了减轻这种方法选择问题,在本文中,我们提出了一种物质学习框架来简化此过程。我们对2000年至2019年的118次缺失的数据处理方法调查论文进行了广泛的文献综述。通过本综述,我们分析了9个影响因素和12个选择标准,用于缺少数据处理方法,进一步执行6个流行缺失数据处理方法的详细分析(4机器学习方法,即KNN归责(KNNI),加权KNN归责(WKNNI),K表示归纳(KMI),以及模糊KMI(FKMI),以及2个临时方法,即中值/模式归档(MMI )和组/类MMI(CMMI))。我们专注于缺少数据处理方法选择3种不同的分类技术,即C4.5,KNN和RIPPER。在我们的评估中,我们通过龙骨和UCI数据集存储库采用25个现实世界数据集。我们的出处元学习框架建议使用Knni处理缺少数据机制时缺少随机缺少的缺失值(MCAR),缺少数据模式是Uni-Attribute缺失数据模式,或单调缺少数据模式,缺少数据速率在[1 %,5%],类标签的数量是2,样本大小不超过10'000,因为它可以更好地保持分类性能,并且具有比后续分类当所有其他5个缺少的数据处理方法更高的估算精度和廉价的释放。方法是KNN或RIPPER。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号