首页> 外文期刊>Expert systems with applications >A load-balanced distributed parallel mining algorithm
【24h】

A load-balanced distributed parallel mining algorithm

机译:负载均衡的分布式并行挖掘算法

获取原文
获取原文并翻译 | 示例
       

摘要

Due to the exponential growth in worldwide information, companies have to deal with an ever growing amount of digital information. One of the most important challenges for data mining is quickly and correctly finding the relationship among data. The Apriori algorithm has been the most popular technique in finding frequent patterns. However, when applying this method, a database has to be scanned many times to calculate the counts of a huge number of candidate itemsets. Parallel and distributed computing is an effective strategy for accelerating the mining process. In this paper, the Distributed Parallel Apriori (DPA) algorithm is proposed as a solution to this problem. In the proposed method, metadata are stored in the form of Transaction Identifiers (TIDs), such that only a single scan to the database is needed. The approach also takes the factor of itemset counts into consideration, thus generating a balanced workload among processors and reducing processor idle time. Experiments on a PC cluster with 16 computing nodes are also made to show the performance of the proposed approach and compare it with some other parallel mining algorithms. The experimental results show that the proposed approach outperforms the others, especially while the minimum supports are low.
机译:由于全球信息的指数增长,公司不得不处理越来越多的数字信息。数据挖掘的最重要挑战之一是快速正确地找到数据之间的关系。 Apriori算法一直是发现频繁模式的最流行技术。但是,应用此方法时,必须对数据库进行多次扫描以计算大量候选项目集的计数。并行和分布式计算是加速采矿过程的有效策略。本文提出了一种分布式并行先验(DPA)算法来解决这个问题。在提出的方法中,元数据以事务标识符(TID)的形式存储,因此只需要对数据库进行一次扫描即可。该方法还考虑了项目集计数的因素,因此在处理器之间产生了平衡的工作负载并减少了处理器的空闲时间。还对具有16个计算节点的PC群集进行了实验,以显示该方法的性能并将其与其他并行挖掘算法进行比较。实验结果表明,该方法优于其他方法,特别是在最小支持量较低的情况下。

著录项

  • 来源
    《Expert systems with applications》 |2010年第3期|2459-2464|共6页
  • 作者单位

    Department of Computer Science and Information Engineering, Chung Hua University, 707, Sec. 2, WuFu Rd., HsinChu 300, Taiwan, ROC;

    Institute of Engineering and Science, Chung Hua University, 707, Sec. 2, WuFu Rd., HsinChu 300, Taiwan, ROC;

    Department of Computer Science and Information Engineering, National University of Kaohsiung, 700, Kaohsiung University Rd, Kaohsiung 811, Taiwan, ROC;

    Department of Information Management, Chung Hua University, 707, Sec. 2, WuFu Rd., HsinChu 300, Taiwan, ROC;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    parallel and distributed processing; cluster computing; frequent patterns; association rules; data mining;

    机译:并行和分布式处理;集群计算;频繁的模式;关联规则;数据挖掘;
  • 入库时间 2022-08-17 13:33:11

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号