首页> 外文期刊>Engineering and Applied Science Research >Improving quality of breast cancer data through pre-processing
【24h】

Improving quality of breast cancer data through pre-processing

机译:通过预处理提高乳腺癌数据的质量

获取原文
           

摘要

Using data mining for medical prognosis becomes a promising approach recently. In the mining process, theraw data are commonly suffering from outlier and imbalanced problems which affect the performance of themodel in predicting the unseen data. Thus, choosing appropriate data mining algorithms has a straightforward impact on the prediction model. The objective of this study is to investigate the use of three kinds ofdata pre-processing techniques including outlier filtering, Synthetic Minority Over-sampling TEchnique(SMOTE) and attribute selections for improving the quality of breast cancer data at Srinagarind Hospital inThailand. Three types of decision rule building techniques, i.e. Decision Table with Na?ve Bays (DTNB),Repeated Incremental Pruning to Produce Error Reduction (RIPPER) and PART Decision List were employed.The performance of proposed approaches was evaluated through the Area Under the receiver operatingcharacteristics Curve (AUC) of the decision rules. Experimental results have shown that applying the suitabledata pre-processing, especially the outlier filtering method, can lead to the significant improvement of theprediction performance of decision rule models.
机译:使用数据挖掘进行医学预后成为最近的有前途的方法。在挖掘过程中,原始数据通常会遇到离群值和不平衡问题,这些问题会影响模型在预测看不见数据时的性能。因此,选择合适的数据挖掘算法会对预测模型产生直接影响。这项研究的目的是调查三种数据预处理技术的使用,其中包括离群值过滤,合成少数族裔过采样技术(SMOTE)和属性选择,以提高泰国Srinagarind医院的乳腺癌数据质量。采用了三种类型的决策规则构建技术,即“幼稚的决策表”(DTNB),“重复增量修剪以减少错误”(RIPPER)和“ PART决策列表”。通过接收方区域评估所提出方法的性能决策规则的操作特性曲线(AUC)。实验结果表明,应用适当的数据预处理,尤其是离群值滤波方法,可以大大提高决策规则模型的预测性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号