【24h】

Discretizing Continuous Attributes Using Information Theory

机译:使用信息论离散化连续属性

获取原文
获取原文并翻译 | 示例

摘要

Many classification algorithms require that training examples contain only discrete values. In order to use these algorithms when some attributes have continuous numeric values, the numeric attributes must be converted into discrete ones. This paper describes a new way of discretizing numeric values using information theory. The amount of information each interval gives to the target attribute is measured using Hellinger divergence, and the interval boundaries are decided so that each interval contains as equal amount of information as possible. In order to compare our discretization method with some current discretization methods, several popular classification data sets are selected for discretization. We use naive Bayesian classifier and C4.5 as classification tools to compare the accuracy of our discretization method with that of other methods.
机译:许多分类算法要求训练示例仅包含离散值。为了在某​​些属性具有连续数字值时使用这些算法,必须将数字属性转换为离散的值。本文介绍了一种使用信息论离散数值的新方法。使用Hellinger散度测量每个间隔提供给目标属性的信息量,并确定间隔边界,以使每个间隔包含尽可能相等的信息量。为了将我们的离散化方法与当前的离散化方法进行比较,选择了几种流行的分类数据集进行离散化。我们使用朴素的贝叶斯分类器和C4.5作为分类工具,以比较我们离散化方法和其他方法的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号