首页> 外文期刊>International Journal of Engineering Science and Technology >Date Classification through integration of Sequential process involving Data cleaning, attribute oriented induction, Relevance analysis as preprocessor to induction of decision tree USING RELATIONAL DATABASE
【24h】

Date Classification through integration of Sequential process involving Data cleaning, attribute oriented induction, Relevance analysis as preprocessor to induction of decision tree USING RELATIONAL DATABASE

机译:通过集成涉及数据清理,面向属性的归纳,将关联分析作为决策树的归纳的预处理程序的顺序过程的集成来进行日期分类

获取原文
       

摘要

Classification, i.e. classifying unknown values of certain attributes of interest based on the values of other attributes, is a major task in data mining. A well accepted method of classification is the induction of decision tree. However, since this approach perform classification on primitive data stored in the database; it inherits the problems such as difficulties in handling large amounts of data and continuous numerical values, the tendency to favor many-valued attributes in the selection of determinant attribute, etc. Also the efficiency of existing decision tree algorithms has been well established for relatively small data sets. In data mining applications, very large training sets are common. Hence, this restriction limits the scalability of such algorithms. Also in most data mining application, users have a little knowledge regarding which attribute should be selected for effective mining. In this paper, we address above issues by proposing a data classification method which integrates data cleaning, attribute oriented induction, relevance analysis and induction of decision trees. This method extracts rules at multiple levels of abstraction and handles large data sets and continuous numerical values in a scalable way.
机译:分类,即基于其他属性的值对某些感兴趣属性的未知值进行分类,是数据挖掘中的主要任务。公认的分类方法是决策树的归纳。但是,由于此方法对存储在数据库中的原始数据执行分类;它继承了诸如处理大量数据的困难和连续数值,在行列式属性选择中倾向于多值属性的趋势等问题。此外,对于较小的现有决策树算法,其效率也得到了很好的确立数据集。在数据挖掘应用中,非常大的训练集很常见。因此,此限制限制了此类算法的可伸缩性。同样在大多数数据挖掘应用程序中,用户对应选择哪个属性以进行有效挖掘几乎一无所知。在本文中,我们通过提出一种将数据清洗,面向属性的归纳,相关性分析和决策树归纳相结合的数据分类方法来解决上述问题。该方法提取多个抽象级别的规则,并以可伸缩的方式处理大型数据集和连续数值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号