首页> 外文期刊>Intelligent Data Analysis >The fitness-rough: A new attribute reduction method based on statistical and rough set theory
【24h】

The fitness-rough: A new attribute reduction method based on statistical and rough set theory

机译:适应度粗糙:一种基于统计和粗糙集理论的新属性约简方法

获取原文
获取原文并翻译 | 示例
           

摘要

Abstract. Attribute reduction has become an important pre-processing task to reduce the complexity of the data miningntask. Rough reducts, statistical methods and correlation-based methods have gradually contributed towards improving attributenreduction techniques to a certain extent. Statistical methods are generally lower in computational complexity compared to thenrough reducts and the correlation-based methods, but many have proven that the rough reducts method is significant in reducingnimportant attributes without causing too much information loss. Correlation-based methods on the other hand evaluate featuresnas a subset instead of individual attribute. In this paper, we propose a combination of statistical and rough set methods to reducenimportant attributes in a simpler way while maintaining a lesser degree of information loss from the raw data. The fitness-roughnmethod (FsR) indicates important attributes from raw data and it is further simplified to a more compact information table.nBesides that, we have also looked into the problem of information loss in this method. Ten UCI machine learning datasetsnwere used as testing sets on the proposed method as compared to the classical rough reducts (RR) method, the statisticalnentropy (ENT) method and the correlation-based feature selection (CFS) method. Experimental results show that our methodnhas performed comparatively well with higher reduction strength and smaller rules set against the benchmarking methods,nespecially in medium size datasets. However, the FsR method is basically less efficient when used on mix-mode and nominalndatasets as the non-quantitative attributes involved in these datasets are normally pre-categorised.
机译:抽象。属性约简已成为减少数据挖掘任务复杂性的重要预处理任务。粗略的归约,统计方法和基于相关性的方法在一定程度上逐渐有助于改进属性归约技术。统计方法与粗略归约法和基于相关性的方法相比,其计算复杂度通常较低,但是许多方法已证明,粗略归约法在减少重要属性而不会造成过多信息丢失的情况下具有重要意义。另一方面,基于相关性的方法将特征评估为子集而不是单个属性。在本文中,我们提出了一种统计方法和粗糙集方法的组合,以更简单的方式减少重要的属性,同时保持较少的原始数据信息丢失程度。适应性粗糙方法(FsR)表示原始数据中的重要属性,并进一步简化为更紧凑的信息表。n此外,我们还研究了此方法中的信息丢失问题。与经典的粗糙约简(RR)方法,统计熵(ENT)方法和基于相关的特征选择(CFS)方法相比,该方法使用了10个UCI机器学习数据集作为测试集。实验结果表明,我们的方法相对于基准测试方法表现出较好的抗折强度和较小的规则,特别是在中等大小的数据集中。但是,在混合模式和标称值数据集上使用时,FsR方法基本上效率较低,因为通常将这些数据集中涉及的非量化属性预先分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号