首页> 外文会议>International Conference on Recent Trends in Information Technology >A high speed decision tree classifier algorithm for huge dataset
【24h】

A high speed decision tree classifier algorithm for huge dataset

机译:大型数据集的高速决策树分类器算法

获取原文

摘要

Knowledge discovery is an important tool for the intelligent business to transform data into useful information that will increase the business revenue. Data mining techniques support automatic exploration of data, and attempts to classify the patterns and trends in data, and also infer decision rules from those patterns. Classification of dataset is an important function of mining which is a supervised machine learning procedure. Scalability and efficiency of the classifier algorithm becomes a major issue of concern when we use a large dataset and requires more number of dataset parsing. In this paper, we present a scalable decision tree algorithm for classifying large dataset with high processing speed, which requires only one scan over the dataset. It overcomes the drawback of RainForest algorithm which addresses the scalability issue and requires a pass over the dataset in each level of decision tree construction. The proposed algorithm significantly reduces the IO cost and also requires one time sorting for numerical attributes which leads to a better performance in time dimension. According to the experimental results, our algorithm acquires less execution time over the RainForest algorithm and also adoptable for any attribute selection method by which the accuracy of decision tree is improved.
机译:知识发现是智能业务将数据转换为有用信息的重要工具,可以增加业务收入。数据挖掘技术支持自动探索数据,并尝试对数据的模式和趋势进行分类,并从这些模式中推断出决策规则。数据集的分类是挖掘的重要功能,它是有监督的机器学习程序。当我们使用大型数据集并且需要更多数量的数据集解析时,分类器算法的可伸缩性和效率成为关注的主要问题。在本文中,我们提出了一种可扩展的决策树算法,用于以高处理速度对大型数据集进行分类,只需要对数据集进行一次扫描即可。它克服了RainForest算法的缺点,该算法解决了可伸缩性问题,并且需要在决策树构造的每个级别中传递数据集。所提出的算法大大降低了IO成本,并且还需要对数字属性进行一次时间排序,从而在时间维度上获得更好的性能。根据实验结果,与RainForest算法相比,我们的算法执行时间更少,并且可用于任何属性选择方法,从而提高决策树的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号