首页> 外文期刊>Expert Systems >Data preprocessing issues for incomplete medical datasets
【24h】

Data preprocessing issues for incomplete medical datasets

机译:不完整医疗数据集的数据预处理问题

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

While there is an ample amount of medical information available for data mining, many of the datasets are unfortunately incomplete – missing relevant values needed by many machine learning algorithms. Several approaches have been proposed for the imputation of missing values, using various reasoning steps to provide estimations from the observed data. One of the important steps in data mining is data preprocessing, where unrepresentative data is filtered out of the data to be mined. However, none of the related studies about missing value imputation consider performing a data preprocessing step before imputation. Therefore, the aim of this study is to examine the effect of two preprocessing steps, feature and instance selection, on missing value imputation. Specifically, eight different medical-related datasets are used, containing categorical, numerical and mixed types of data. Our experimental results show that imputation after instance selection can produce better classification performance than imputation alone. In addition, we will demonstrate that imputation after feature selection does not have a positive impact on the imputation result.
机译:尽管有大量的医学信息可用于数据挖掘,但不幸的是,许多数据集并不完整-缺少许多机器学习算法所需的相关值。已经提出了几种估算缺失值的方法,它们使用各种推理步骤来提供对观测数据的估计。数据挖掘中的重要步骤之一是数据预处理,其中从要挖掘的数据中过滤掉非代表性数据。但是,有关缺失值插补的任何相关研究都没有考虑在插补之前执行数据预处理步骤。因此,本研究的目的是检验两个预处理步骤(特征和实例选择)对缺失值估算的影响。具体来说,使用八个不同的医学相关数据集,其中包含分类,数值和混合类型的数据。我们的实验结果表明,实例选择后的插补比单独的插补可以产生更好的分类性能。另外,我们将证明特征选择后的插补对插补结果没有积极影响。

著录项

  • 来源
    《Expert Systems》 |2016年第5期|432-438|共7页
  • 作者单位

    Taichung Veterans General Hospital Department of Psychiatry Chiayi Branch Chiayi Taiwan;

    Asia University Department of Computer Science and Information Engineering Taichung Taiwan;

    Kaohsiung Municipal Chinese Medical Hospital Department of Pharmacy Kaohsiung Taiwan;

    Kaohsiung Medical University Graduate Institute of Natural Products Kaohsiung Taiwan;

    Chung Yuan Christian University Department of Information and Computer Engineering Taoyuan City Taiwan;

    National Central University Department of Information Management Taoyuan City Taiwan;

    Tennessee Technological University Department of Computer Science Cookeville TN USA;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    missing value; imputation; feature selection; instance selection; incomplete medical datasets;

    机译:缺失值;输入量;特征选择;实例选择;医疗数据集不完整;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号