首页> 外文期刊>International Journal of Data Warehousing and Mining >An Efficient Method for Discretizing Continuous Attributes
【24h】

An Efficient Method for Discretizing Continuous Attributes

机译:一种离散化连续属性的有效方法

获取原文
获取原文并翻译 | 示例
       

摘要

In this article the authors present a novel method for finding optimal split points for discretization of con-tinuous attributes. Such a method can be used in many data mining techniques for large databases. The method consists of two major steps. In the first step search space is pruned using a bisecting region method that partitions the search space and returns the point with the highest information gain based on its search. The second step consists of a hill climbing algorithm that starts with the point returned by the first step and greedily searches for an optimal point. The methods were tested using fifteen attributes from two data sets. The results show that the method reduces the number of searches drastically while identifying the optimal or near-optimal split points. On average, there was a 98% reduction in the number of information gain calcula-tions with only 4% reduction in information gain.
机译:在本文中,作者提出了一种新颖的方法,该方法可以找到用于连续属性离散化的最佳分割点。这种方法可以用于大型数据库的许多数据挖掘技术中。该方法包括两个主要步骤。第一步,使用二等分区域方法对搜索空间进行修剪,该方法将搜索空间分区,并根据搜索结果返回具有最高信息增益的点。第二步包括爬山算法,该算法从第一步返回的点开始,然后贪婪地搜索最佳点。使用来自两个数据集的十五个属性测试了这些方法。结果表明,该方法在确定最佳或接近最佳分割点的同时,大大减少了搜索次数。平均而言,信息增益计算数量减少了98%,而信息增益仅减少了4%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号