【24h】

CD: A Coupled Discretization Algorithm

机译:CD:耦合离散化算法

获取原文

摘要

Discretization technique plays an important role in data mining and machine learning. While numeric data is predominant in the real world, many algorithms in supervised learning are restricted to discrete variables. Thus, a variety of research has been conducted on discretization, which is a process of converting the continuous attribute values into limited intervals. Recent work derived from entropy-based discretization methods, which has produced impressive results, introduces information attribute dependency to reduce the uncertainty level of a decision table; but no attention is given to the increment of certainty degree from the aspect, of positive domain ratio. This paper proposes a discretization algorithm based on both positive domain and its coupling with information entropy, which not only considers information attribute dependency but also concerns deterministic feature relationship. Substantial experiments on extensive UCI data sets provide evidence that our proposed coupled discretization algorithm generally outperforms other seven existing methods and the positive domain based algorithm proposed in this paper, in terms of simplicity, stability, consistency, and accuracy.
机译:离散化技术在数据挖掘和机器学习中起着重要作用。尽管数字数据在现实世界中占主导地位,但监督学习中的许多算法都限于离散变量。因此,已经对离散化进行了各种研究,离散化是将连续属性值转换成有限间隔的过程。来自基于熵的离散化方法的最新工作产生了令人印象深刻的结果,引入了信息属性相关性以减少决策表的不确定性;但是从正域比的角度来看,没有关注确定性程度的增加。提出了一种基于正域及其与信息熵耦合的离散化算法,该算法不仅考虑了信息属性的依赖性,而且还涉及确定性特征关系。在大量UCI数据集上进行的大量实验提供了证据,证明我们提出的耦合离散化算法在简单性,稳定性,一致性和准确性方面总体上优于其他七种现有方法和本文提出的基于正域的算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号