首页> 外文期刊>IEEE transactions on evolutionary computation >Evolving accurate and compact classification rules with gene expression programming
【24h】

Evolving accurate and compact classification rules with gene expression programming

机译:通过基因表达程序开发精确而紧凑的分类规则

获取原文
获取原文并翻译 | 示例

摘要

Classification is one of the fundamental tasks of data mining. Most rule induction and decision tree algorithms perform a local, greedy search to generate classification rules that are often more complex than necessary. Evolutionary algorithms for pattern classification have recently received increased attention because they can perform global searches. In this paper, we propose a new approach for discovering classification rules by using gene expression programming (GEP), a new technique of genetic programming (GP) with linear representation. The antecedent of discovered rules may involve many different combinations of attributes. To guide the search process, we suggest a fitness function considering both the rule consistency gain and completeness. A multiclass classification problem is formulated as multiple two-class problems by using the one-against-all learning method. The covering strategy is applied to learn multiple rules if applicable for each class. Compact rule sets are subsequently evolved using a two-phase pruning method based on the minimum description length (MDL) principle and the integration theory. Our approach is also noise tolerant and able to deal with both numeric and nominal attributes. Experiments with several benchmark data sets have shown up to 20% improvement in validation accuracy, compared with C4.5 algorithms. Furthermore, the proposed GEP approach is more efficient and tends to generate shorter solutions compared with canonical tree-based GP classifiers.
机译:分类是数据挖掘的基本任务之一。大多数规则归纳和决策树算法执行局部贪婪搜索,以生成通常比必要的复杂的分类规则。用于模式分类的进化算法最近得到了越来越多的关注,因为它们可以执行全局搜索。在本文中,我们提出了一种使用基因表达编程(GEP)来发现分类规则的新方法,这是一种具有线性表示形式的遗传编程(GP)的新技术。发现的规则的先决条件可能涉及属性的许多不同组合。为了指导搜索过程,我们建议同时考虑规则一致性增益和完整性的适应度函数。通过使用“一对一”学习方法将多类分类问题表述为多个两类问题。如果适用于每个班级,则覆盖策略可用于学习多个规则。随后,基于最小描述长度(MDL)原理和集成理论,使用两阶段修剪方法来发展紧凑规则集。我们的方法也是耐噪声的,并且能够处理数字和标称属性。与C4.5算法相比,使用多个基准数据集进行的实验表明,验证准确性提高了20%。此外,与基于规范树的GP分类器相比,提出的GEP方法更有效,并且倾向于生成更短的解决方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号