首页> 外文期刊>Parallel Computing >Parallel tree-projection-based sequence mining algorithms
【24h】

Parallel tree-projection-based sequence mining algorithms

机译:基于并行树投影的序列挖掘算法

获取原文
获取原文并翻译 | 示例

摘要

Discovery of sequential patterns is becoming increasingly useful and essential in many scientific and commercial domains. Enormous sizes of available datasets and possibly large number of mined patterns demand efficient, scalable, and parallel algorithms. Even though a number of algorithms have been developed to efficiently parallelize frequent pattern discovery algorithms that are based on the candidate-generation-and-counting framework, the problem of parallelizing the more efficient projection-based algorithms has received relatively little attention and existing parallel formulations have been targeted only toward shared-memory architectures. The irregular and unstructured nature of the task-graph generated by these algorithms and the fact that these tasks operate on overlapping sub-databases makes it challenging to efficiently parallelize these algorithms on scalable distributed-memory parallel computing architectures. In this paper we present and study a variety of distributed-memory parallel algorithms for a tree-projection-based frequent sequence discovery algorithm that are able to minimize the various overheads associated with load imbalance, database overlap, and interprocessor communication. Our experimental evaluation on a 32 processor IBM SP show that these algorithms are capable of achieving good speedups, substantially reducing the amount of the required work to find sequential patterns in large databases.
机译:在许多科学和商业领域中,顺序模式的发现变得越来越有用和必不可少。可用数据集的大小庞大,可能还有大量的挖掘模式,需要高效,可扩展和并行的算法。即使已经开发了许多算法来有效地并行化基于候选生成和计数框架的频繁模式发现算法,但是并行化更有效的基于投影的算法的问题却很少受到关注,并且现有的并行公式也很少受到关注。仅针对共享内存体系结构。这些算法生成的任务图具有不规则和非结构化的性质,并且这些任务在重叠的子数据库上运行,这一事实使在可扩展的分布式内存并行计算体系结构上有效地并行化这些算法具有挑战性。在本文中,我们针对基于树投影的频繁序列发现算法提出并研究了多种分布式内存并行算法,这些算法能够最大程度地减少与负载不平衡,数据库重叠和处理器间通信相关的各种开销。我们对32处理器IBM SP的实验评估表明,这些算法能够实现良好的加速,从而大大减少了在大型数据库中查找顺序模式所需的工作量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号