首页> 外文期刊>Decision Science Letters >Genetic algorithm rule based categorization method for textual data mining
【24h】

Genetic algorithm rule based categorization method for textual data mining

机译:基于遗传算法的文本数据挖掘的分类方法

获取原文
       

摘要

The rule based categorization approaches such as associative classification have the capability to produce classifiers rival to those learned by traditional categorization approaches such as Na?ve Bayes and K-nearest Neighbor. However, the lack of useful discovery and usage of categorization rules are the major challenges of rule based approaches and their performance is declined with large set of rules. Genetic Algorithm (GA) is effective to reduce the high dimensionality and improve categorization performance. However, the usage of GA in most researches is limited in the categorization preprocessing stage and its results is used to simplify the categorization process performed by other categorization algorithms. This paper proposed a hybrid GA rule based categorization method, named genetic algorithm rule based categorization (GARC), to enhance the accuracy of categorization rules and to produce accurate classifier for text mining. The GARC consists of three main stages; namely, search space determination, rule discovery with validation (rule generation), and categorization. The experimental results are carried out on three Arabic text datasets with multiple categories to evaluate the efficiency of GARC. The results show that a promising performance was achieved by using GARC for Arabic text categorization. The GARC achieves the best performance with small feature space in most situations.
机译:基于规则的分类方法,如关联分类具有生产分类器竞争对手的竞争对手,这些方法是由传统分类方法(如Na ve Bayes和K-Collest邻居)学习的那些。但是,缺乏有用的发现和分类规则的使用是规则基础的方法的主要挑战,他们的表现因大规模规则而拒绝。遗传算法(GA)有效地降低高维度并提高分类性能。然而,在大多数研究中使用GA的用法在分类预处理阶段的范围内有限,其结果用于简化由其他分类算法执行的分类过程。本文提出了一种混合GA规则基于分类方法,命名为基于遗传算法规则的分类(GARC),以增强分类规则的准确性,并为文本挖掘生成准确分类器。 Garc由三个主要阶段组成;即搜索空间确定,用验证规则发现(规则生成)和分类。实验结果是在三个阿拉伯语文本数据集中进行,具有多个类别来评估GARC的效率。结果表明,通过使用Garc进行阿拉伯文分类,实现了有希望的性能。 GARC在大多数情况下实现了小型特色空间的最佳性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号