首页> 外文会议>SEMCCO 2011;International conference on swarm, evolutionary, and memetic computing >An Improved CART Decision Tree for Datasets with Irrelevant Feature
【24h】

An Improved CART Decision Tree for Datasets with Irrelevant Feature

机译:具有不相关特征的数据集的改进的CART决策树

获取原文

摘要

Data mining tasks results are usually improved by reducing the dimensionality of data. This improvement however is achieved harder in the case that data size is moderate or huge. Although numerous algorithms for accuracy improvement have been proposed, all assume that inducing a compact and highly generalized model is difficult. In order to address above said issue, we introduce Randomized Gini Index (RGI), a novel heuristic function for dimensionality reduction, particularly applicable in large scale databases. Apart from removing irrelevant attributes, our algorithm is capable of minimizing the level of noise in the data to a greater extend which is a very attractive feature for data mining problems. We extensively evaluate its performance through experiments on both artificial and real world datasets. The outcome of the study shows the suitability and viability of our approach for knowledge discovery in moderate and large datasets.
机译:通常,通过减少数据的维数可以改善数据挖掘任务的结果。但是,在数据大小适中或巨大的情况下,很难实现此改进。尽管已经提出了许多用于提高精度的算法,但所有算法都假定难以生成紧凑且高度通用的模型。为了解决上述问题,我们引入了随机基尼系数(RGI),这是一种用于降维的新颖启发式函数,特别适用于大规模数据库。除了删除不相关的属性外,我们的算法还能够最大程度地降低数据中的噪声水平,这对于数据挖掘问题而言是非常有吸引力的功能。我们通过在人工和真实数据集上进行实验来广泛评估其性能。研究结果表明,我们的方法适合中型和大型数据集的知识发现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号