首页> 外文学位 >Frequent sets by leap-traversal for sequential and parallel paradigms.
【24h】

Frequent sets by leap-traversal for sequential and parallel paradigms.

机译:通过跳跃遍历的频次集可用于顺序和并行范例。

获取原文
获取原文并翻译 | 示例

摘要

Efficient discovery of frequent patterns from large databases is an active research area in data mining with broad applications in industry and deep implications in many areas of data mining. Although many efficient frequent-pattern mining techniques have been developed in the last decade, most of them assume relatively small databases, leaving extremely large but realistic datasets out of reach. When computationally feasible, mining extremely large databases produces tremendously large numbers of frequent patterns. Mining for frequent itemsets can generate an overwhelming number of patterns, often exceeding the size of the original transactional database. In many cases, it is impractical to mine those datasets due to their sheer size; not only because of the extent of the existing patterns, but mainly the magnitude of the search space.; In this research we propose a new traversal approach that jumps in the search space among only promising nodes. Our leaping approach avoids nodes that would not participate in the answer set and drastically reduces the number of candidate patterns. We use this approach to efficiently pinpoint maximal patterns at the border of the frequent patterns in the lattice and collect enough information in the process to generate all subsequent patterns.; Using this approach we did mine sequentially sizes never been reported. We also generated different types of patterns and push constraints efficiently to filter the answer set to only patterns that are of interest to the decision makers.; To open the doors to the mining of extremely large databases, parallelizing the search for frequent patterns plays an important role. Not all good sequential algorithms can be effectively parallelized and parallelization alone is not enough. An algorithm has to be well suited for parallelization, and in the case of frequent pattern mining, clever methods for searching are certainly an advantage. The algorithm we propose for parallel mining of frequent maximal patterns, is based on our new technique for astutely jumping within the search space, and more importantly, is composed of autonomous task segments that can be performed separately and thus minimize communication between processors. Our parallel algorithm for mining frequent patterns generates all types of patterns and supports constraints pushing. Using this approach allows the mining, in a reasonable time, of databases in the order of billion transactions using relatively inexpensive clusters.
机译:从大型数据库中高效发现频繁模式是数据挖掘中一个活跃的研究领域,在工业中具有广泛的应用,并且在许多数据挖掘领域中都具有深远的意义。尽管在过去的十年中开发了许多有效的频繁模式挖掘技术,但大多数技术都假设使用相对较小的数据库,从而使庞大而现实的数据集无法获得。当计算上可行时,挖掘极大的数据库会产生大量的频繁模式。频繁项目集的挖掘会产生大量模式,通常超过原始事务数据库的大小。在很多情况下,由于数据集的绝对大小,要挖掘它们是不切实际的。不仅由于现有模式的范围,而且主要是由于搜索空间的大小。在这项研究中,我们提出了一种新的遍历方法,该方法仅在有希望的节点之间在搜索空间中跳跃。我们的跨越式方法避免了不参与答案集的节点,并大大减少了候选模式的数量。我们使用这种方法来有效地在格子中频繁模式的边界处精确定位最大模式,并在过程中收集足够的信息以生成所有后续模式。使用这种方法,我们确实没有按顺序挖掘尺寸。我们还生成了不同类型的模式,并有效地推动了约束条件,以将答案集过滤为仅决策者感兴趣的模式。为了为挖掘大型数据库打开大门,并行搜索频繁模式起着重要作用。并非所有好的顺序算法都可以有效地并行化,仅并行化还不够。一种算法必须非常适合并行化,并且在频繁进行模式挖掘的情况下,聪明的搜索方法无疑是一个优势。我们提出的用于频繁挖掘最大模式的并行挖掘的算法基于我们的新技术,可以在搜索空间内快速跳转,更重要的是,该算法由可以单独执行的自治任务段组成,从而最大程度地减少了处理器之间的通信。我们的用于挖掘频繁模式的并行算法可生成所有类型的模式,并支持约束推送。使用这种方法可以使用相对便宜的集群在合理的时间内挖掘数十亿次事务的数据库。

著录项

  • 作者

    El-Hajj, Mohammad Omar.;

  • 作者单位

    University of Alberta (Canada).;

  • 授予单位 University of Alberta (Canada).;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2006
  • 页码 125 p.
  • 总页数 125
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

  • 入库时间 2022-08-17 11:39:47

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号