首页> 外文期刊>Energy education science and technology >A hybrid approach for improving the accuracy of classification algorithms in data mining
【24h】

A hybrid approach for improving the accuracy of classification algorithms in data mining

机译:一种提高数据挖掘中分类算法准确性的混合方法

获取原文
获取原文并翻译 | 示例
       

摘要

Classification and rule induction are two important methods/processes to extract knowledge from data. In rule induction, the representation of knowledge is defined as IF-THEN rules which are easily understandable and applicable by problem-domain experts. Classification is to organize a large data set objects into predefined classes, described by a set of attributes, using supervised learning methods. The objective of this study is to present a new classification algorithm, RES (Rule Extraction System), for automatic knowledge acquisition in data mining. It aims at eliminating the pitfalls and the disadvantages of the techniques and algorithms currently in use. The proposed algorithm makes use of the direct rule extraction approach, rather than the decision tree. For this purpose, it uses a set of examples to induce general rules. In this study, the rule base is created through the knowledge discovery by employing RES algorithm, a data mining technique, on the sample sets of the Wisconsin Breast Cancer, Ljubljana Breast Cancer, Dermatology, Hepatitis, Iris, Tic-Tac-Toe, Nursery, Lympograph, CRX and Diabetes, which are real life data and commonly used in the machine learning. In terms of the accuracy rate, the results of this study were compared to the results of the algorithms widely used in this field, such as C4.5, NavieBayes, PART, CN2, CORE, GA-SVM. The proposed algorithm showed promising results.
机译:分类和规则归纳是从数据中提取知识的两个重要方法/过程。在规则归纳中,知识的表示被定义为IF-THEN规则,这些问题容易被问题领域的专家理解和应用。分类是使用监督学习方法将大型数据集对象组织到由一组属性描述的预定义类中。这项研究的目的是提出一种新的分类算法RES(规则提取系统),用于数据挖掘中的自动知识获取。它旨在消除当前使用的技术和算法的弊端和缺点。提出的算法利用直接规则提取方法,而不是决策树。为此,它使用一组示例来得出一般规则。在这项研究中,通过使用数据挖掘技术RES算法通过知识发现,在威斯康星州乳腺癌,卢布尔雅那乳腺癌,皮肤病学,肝炎,虹膜,井字游戏,托儿所的样本集上创建规则库,Lympograph,CRX和Diabetes,它们是现实生活中的数据,通常用于机器学习中。就准确率而言,将本研究的结果与该领域广泛使用的算法(例如C4.5,NavieBayes,PART,CN2,CORE,GA-SVM)的结果进行比较。提出的算法显示出令人鼓舞的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号