首页> 外文会议>International conference on extending database technology >SLIQ: A Fast Scalable Classifier for Data Mining
【24h】

SLIQ: A Fast Scalable Classifier for Data Mining

机译:SLIQ:用于数据挖掘的快速可扩展分类器

获取原文

摘要

Classification is an important problem in the emerging field of data mining. although classification has been studied extensively in the past, most of the classification algorithms are designed only for memory-resident data, thus limitng their suitability for data moning larger data sets. This paper discusses issues in builidng a scalable classifier and presents the design of SLIQ~1, a new classifier. SLIQ is a decision tree classifier that can handle both numeric and categorical attributes. It uses a novel pre-sorting technique in the tree-growth phase. This sorting procedure is integrated with a breadth-first tree growing strategy to enable classification of disk-resident datasets. SLIQ also uses a new tree-pruning algorithm that is inexpensive, and results in compact and accurate trees. The combination of these techniques enables SLIQ to scale for large data sets and classify data sets irrespective of the number of classes, attributes, and examples (records), thus making it an attractive tool for data mining.
机译:分类是新出现的数据挖掘领域的重要问题。尽管分类已经在过去广泛的研究,大部分的分类算法只专为内存驻留数据,从而limitng其数据moning更大的数据集适用性。本文讨论了Builidng A可伸缩分类器的问题,并呈现了SLIQ〜1的设计,一个新分类器。 SLIQ是一个决策树分类器,可以处理数字和分类属性。它在树 - 生长阶段使用新的预分类技术。此排序过程与广度一棵树越来越多的策略集成,以实现磁盘驻留数据集的分类。 SLIQ还使用一种廉价的新树修剪算法,并导致紧凑且精确的树木。这些技术的组合使SLIQ能够为大数据集进行规模,并且不论类,属性和示例(记录)的数量,则对数据集进行分类,从而使其成为数据挖掘的有吸引力的工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号