首页> 外文会议>Machine learning(ML95) >Supervised and Unsupervised Discretization of Continuous Features
【24h】

Supervised and Unsupervised Discretization of Continuous Features

机译:连续特征的有监督和无监督离散化

获取原文
获取原文并翻译 | 示例

摘要

Many supervised machine learning algorithms require a discrete feature space. In this paper, we review previous work on continuous feature discretization, identify defining characteristics of the methods, and conduct an empirical evaluation of several methods. We compare binning, an unsupervised discretization method, to entropy-based and purity-based methods, which are supervised algorithms. We found that the performance of the Naive-Bayes algorithm significantly improved when features were discretized using an entropy-based method. In fact, over the 16 tested datasets, the discretized version of Naive-Bayes slightly outperformed C4.5 on average. We also show that in some cases, the performance of the C4.5 induction algorithm significantly improved if features were discretized in advance; in our experiments, the performance never significantly degraded, an interesting phenomenon considering the fact the C4.5 is capable of loaclly discretizing features.
机译:许多监督式机器学习算法需要离散的特征空间。在本文中,我们回顾了有关连续特征离散化的先前工作,确定了方法的定义特征,并对几种方法进行了实证评估。我们将装箱(一种无监督的离散化方法)与基于熵和基于纯度的方法(有监督的算法)进行了比较。我们发现,使用基于熵的方法离散化特征时,朴素贝叶斯算法的性能得到了显着改善。实际上,在16个经过测试的数据集中,Naive-Bayes的离散版本平均比C4.5稍好。我们还表明,在某些情况下,如果预先离散化特征,则C4.5归纳算法的性能会显着提高。在我们的实验中,性能从未显着降低,考虑到C4.5能够局部离散特征这一事实,这是一个有趣的现象。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号