...
首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >Selection-fusion approach for classification of datasets with missing values
【24h】

Selection-fusion approach for classification of datasets with missing values

机译:选择融合方法对缺失值的数据集进行分类

获取原文
获取原文并翻译 | 示例
           

摘要

This paper proposes a new approach based on missing value pattern discovery for classifying incomplete data. This approach is particularly designed for classification of datasets with a small number of samples and a high percentage of missing values where available missing value treatment approaches do not usually work well. Based on the pattern of the missing values, the proposed approach finds subsets of samples for which most of the features are available and trains a classifier for each subset. Then, it combines the outputs of the classifiers. Subset selection is translated into a clustering problem, allowing derivation of a mathematical framework for it. A trade off is established between the computational complexity (number of subsets) and the accuracy of the overall classifier. To deal with this trade off, a numerical criterion is proposed for the prediction of the overall performance. The proposed method is applied to seven datasets from the popular University of California, Irvine data mining archive and an epilepsy dataset from Henry Ford Hospital, Detroit, Michigan (total of eight datasets). Experimental results show that classification accuracy of the proposed method is superior to those of the widely used multiple imputations method and four other methods. They also show that the level of superiority depends on the pattern and percentage of missing values.
机译:本文提出了一种基于缺失值模式发现的不完整数据分类方法。这种方法是专为样本数量少且缺失值百分比高的数据集分类而设计的,在这种情况下,可用的缺失值处理方法通常效果不佳。基于缺失值的模式,所提出的方法找到了具有大部分特征的样本子集,并为每个子集训练了一个分类器。然后,它合并分类器的输出。子集选择被转换为聚类问题,从而可以为其推导数学框架。在计算复杂度(子集数)和整体分类器的准确性之间建立了一个折衷方案。为了应对这种折衷,提出了用于预测整体性能的数值标准。所提出的方法应用于来自广受欢迎的加州大学尔湾分校数据挖掘档案馆的七个数据集和来自密歇根州底特律的亨利·福特医院的癫痫数据集(共八个数据集)。实验结果表明,该方法的分类精度优于广泛使用的多重插补方法和其他四种方法。他们还表明,优势水平取决于缺失值的模式和百分比。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号