【24h】

SLIQ: A Fast Scalable Classifier for Data Mining

机译:SLIQ:用于数据挖掘的快速可扩展分类器

获取原文

摘要

Classification is an important problem in the emerging field of data mining. although classification has been studied extensively in the past, most of the classification algorithms are designed only for memory-resident data, thus limitng their suitability for data moning larger data sets. This paper discusses issues in builidng a scalable classifier and presents the design of SLIQ~1, a new classifier. SLIQ is a decision tree classifier that can handle both numeric and categorical attributes. It uses a novel pre-sorting technique in the tree-growth phase. This sorting procedure is integrated with a breadth-first tree growing strategy to enable classification of disk-resident datasets. SLIQ also uses a new tree-pruning algorithm that is inexpensive, and results in compact and accurate trees. The combination of these techniques enables SLIQ to scale for large data sets and classify data sets irrespective of the number of classes, attributes, and examples (records), thus making it an attractive tool for data mining.
机译:分类是新兴的数据挖掘领域中的一个重要问题。尽管过去已经对分类进行了广泛的研究,但是大多数分类算法仅针对驻留内存的数据而设计,因此限制了它们适用于监视较大数据集的数据的适用性。本文讨论了构建可扩展分类器的问题,并提出了一种新的分类器SLIQ〜1的设计。 SLIQ是一种决策树分类器,可以处理数字和分类属性。它在树的生长阶段使用了一种新颖的预分类技术。此排序过程与“广度优先”的树生长策略集成在一起,可以对磁盘驻留数据集进行分类。 SLIQ还使用了一种新的树修剪算法,该算法便宜,并且可以生成紧凑而准确的树。这些技术的结合使SLIQ可以缩放大型数据集并对数据集进行分类,而无需考虑类,属性和示例(记录)的数量,因此使其成为有吸引力的数据挖掘工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号