...
【24h】

CAIM discretization algorithm

机译:CAIM离散化算法

获取原文
获取原文并翻译 | 示例
           

摘要

The task of extracting knowledge from databases is quite often performed by machine learning algorithms. The majority of these algorithms can be applied only to data described by discrete numerical or nominal attributes (features). In the case of continuous attributes, there is a need for a discretization algorithm that transforms continuous attributes into discrete ones. We describe such an algorithm, called CAIM (class-attribute interdependence maximization), which is designed to work with supervised data. The goal of the CAIM algorithm is to maximize the class-attribute interdependence and to generate a (possibly) minimal number of discrete intervals. The algorithm does not require the user to predefine the number of intervals, as opposed to some other discretization algorithms. The tests performed using CAIM and six other state-of-the-art discretization algorithms show that discrete attributes generated by the CAIM algorithm almost always have the lowest number of intervals and the highest class-attribute interdependency. Two machine learning algorithms, the CLIP4 rule algorithm and the decision tree algorithm, are used to generate classification rules from data discretized by CAIM. For both the CLIP4 and decision tree algorithms, the accuracy of the generated rules is higher and the number of the rules is lower for data discretized using the CAIM algorithm when compared to data discretized using six other discretization algorithms. The highest classification accuracy was achieved for data sets discretized with the CAIM algorithm, as compared with the other six algorithms.
机译:从数据库提取知识的任务通常由机器学习算法执行。这些算法中的大多数只能应用于由离散数值或名义属性(特征)描述的数据。在连续属性的情况下,需要一种将连续属性转换成离散属性的离散化算法。我们描述了一种称为CAIM(类属性相互依赖最大化)的算法,该算法旨在与受监管的数据一起使用。 CAIM算法的目标是最大化类属性的相互依赖性,并生成(可能)最少数量的离散间隔。与某些其他离散化算法相反,该算法不需要用户预先定义间隔数。使用CAIM和其他六种最新的离散化算法执行的测试表明,由CAIM算法生成的离散属性几乎始终具有最低的间隔数和最高的类属性相互依赖性。 CLIP4规则算法和决策树算法是两种机器学习算法,用于根据CAIM离散化的数据生成分类规则。对于CLIP4算法和决策树算法,与使用其他六个离散化算法离散化的数据相比,使用CAIM算法离散化的数据生成的规则的准确性更高,而规则数量则更少。与其他六种算法相比,使用CAIM算法离散化的数据集实现了最高的分类精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号