首页> 外文学位 >Fast algorithms for mining association rules and sequential patterns.
【24h】

Fast algorithms for mining association rules and sequential patterns.

机译:用于挖掘关联规则和顺序模式的快速算法。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation presents fast algorithms for mining associations in large datasets. An example of an association rule may be "(30% of customers who buy jackets and gloves also buy hiking boots." The problem is to find all such rules whose frequency is greater than some user-specified minimum. We first present a new algorithm, Apriori, for mining associations between boolean attributes (called items). Empirical evaluation shows that Apriori is typically 3 to 10 times faster than previous algorithms, with the performance gap increasing with the problem size. In many domains, taxonomies (isa hierarchies) on the items are common. For example, a taxonomy may say that jackets isa outerwear isa clothes. We extend the Apriori algorithm to find associations between items at any level of the taxonomy.; Next, we consider associations between quantitative and categorical attributes, not just boolean attributes. We deal with quantitative attributes by fine-partitioning the values of the attribute and then combining adjacent partitions as necessary. We also introduce measures of partial completeness which quantify the information loss due to partitioning. These measures can be used to determine the number of partitions for each attribute. We enhance the Apriori algorithm to efficiently find quantitative associations.; Finally, we consider associations over time, called sequential patterns. An example of such a sequential pattern may be "2% of customers bought 'Ringworld' in one transaction, followed by 'Foundation' and 'Ringworld Engineers' in a later transaction". We allow time constraints between elements of the sequential patterns, and also allow all the items in an element to be present in a sliding time window rather than at a single point in time. We present GSP, an algorithm for finding such patterns, based on intuitions similar to those for the Apriori algorithm.; All the above algorithms scale linearly with the size of the data (for constant data characteristics). These algorithms have been used in a variety of domains, including market basket analysis, attached mailing, fraud detection and medical research. We conclude the dissertation with directions for future work.
机译:本文提出了一种在大型数据集中挖掘关联的快速算法。关联规则的一个示例可能是“((购买夹克和手套的顾客中有30%的人还购买了远足靴。”问题是要找到所有频率大于用户指定的最小值的规则)。我们首先提出一种新算法Apriori,用于挖掘布尔属性(称为项)之间的关联。经验评估表明,Apriori通常比以前的算法快3至10倍,并且性能差距随着问题的大小而增加。在许多领域中,分类法(isa层次结构)例如,一个分类法可以说夹克是一件外套,一件衣服是服装。我们扩展了Apriori算法,以查找任何分类法级别的物品之间的关联;接下来,我们考虑的是数量和分类属性之间的关联,而不仅仅是布尔属性:我们通过对属性的值进行精细划分,然后根据需要合并相邻分区来处理定量属性,还介绍了pa的度量空间完整性,可以量化由于分区造成的信息丢失。这些度量可用于确定每个属性的分区数。我们增强了Apriori算法,以有效地找到定量关联。最后,我们考虑一段时间内的关联,称为顺序模式。这种顺序模式的示例可能是“有2%的客户在一笔交易中购买了'Ringworld',然后在随后的交易中购买了'Foundation'和'Ringworld Engineers'”。我们允许顺序模式元素之间的时间限制,并且还允许元素中的所有项目出现在滑动时间窗口中,而不是出现在单个时间点上。我们基于类似于Apriori算法的直觉,提出了GSP,一种用于发现这种模式的算法。以上所有算法均随数据大小线性扩展(用于恒定数据特性)。这些算法已在各种领域中使用,包括市场篮分析,附加邮件,欺诈检测和医学研究。总结全文,为今后的工作指明方向。

著录项

  • 作者

    Srikant, Ramakrishnan.;

  • 作者单位

    The University of Wisconsin - Madison.;

  • 授予单位 The University of Wisconsin - Madison.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 1996
  • 页码 168 p.
  • 总页数 168
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号