首页> 外文期刊>Future generation computer systems >PFPMine: A parallel approach for discovering interacting data entities in data-intensive cloud workflows
【24h】

PFPMine: A parallel approach for discovering interacting data entities in data-intensive cloud workflows

机译:pfpmine:一种发现数据密集型云工作流程中的交互数据实体的并行方法

获取原文
获取原文并翻译 | 示例
           

摘要

With the evolution of cloud computing, communities and companies deployed their workflows on cloud to support end-to-end business processes that are usually syndicated with other external services. To improve the efficiency of the system as well as reducing energy consumption, data placement and backup strategies should be carefully designed. One of the most challenging problems is the discovery of interacting data entities in date-intensive workflows. To tackle this challenge, this paper presents a frequent pattern-based approach named FPMine for interacting data entity discovery in cloud workflows. A direct discriminative mining algorithm is first proposed to determine the minimum support threshold, based on which FP-tree is constructed to formulate the frequent item pairs. Next, FP-matrix is applied to avoid traversing the FP-trees during data entity discovery, and a pruning approach is introduced to reduce the redundancy of frequent item pairs. Furthermore, we propose a parallel data entity mining algorithm using MapReduce framework, namely PFPMine, and then design a primitive data placement and backup strategy. Finally, we evaluate the efficiency of our approach by experiments using real-life data, based on which we show that our approach can facilitate the discovery of interacting data entities with efficiency for cloud workflows. Comparing with traditional FP-growth approach, we pay only a multiplicative factor for making our approach able to extract fine-grained frequent item pairs rather than frequent patterns, which can bring significant advantages to data placement. After parallelization, the PFPMine algorithm performs better with high efficiency for both sparse datasets and dense datasets than FP-growth. The results show that PFPMine can reduce the running time by at least 25%, and preforms with significantly higher efficiency than FP-growth approach.
机译:随着云计算的演变,社区和公司在云上部署了他们的工作流,以支持通常与其他外部服务结合的端到端业务流程。为了提高系统的效率以及降低能耗,应仔细设计数据放置和备份策略。最具挑战性的问题之一是在日期密集型工作流程中发现数据实体。为了解决这一挑战,本文提出了一种频繁的基于模式的方法,命名为FPMine,用于在云工作流中交互数据实体发现。首先提出直接判别挖掘算法以确定基于哪个FP树的最小支持阈值以制定频繁的项目对。接下来,应用FP-矩阵以避免在数据实体发现期间遍历FP树,并且引入了修剪方法以减少频繁项目对的冗余。此外,我们提出了一种使用MapReduce框架的并行数据实体挖掘算法,即PFPMINE,然后设计了一个原始数据放置和备份策略。最后,我们通过使用现实生活数据的实验评估我们的方法的效率,我们表明我们的方法可以促进以云工作流程的效率为数据实体进行互动。与传统的FP-生长方法相比,我们只支付乘法因素,使我们的方法能够提取细粒度频繁的项目对而不是频繁的模式,这可以带来显着的数据展示优势。在并行化之后,PFPMINE算法对于稀疏数据集和密集数据集的高效率更好地执行比FP-Grows。结果表明,PFPMINE可以将运行时间减少至少25%,并且效率明显高于FP-生长方法。

著录项

  • 来源
    《Future generation computer systems》 |2020年第12期|474-487|共14页
  • 作者单位

    School of Information Science and Engineering Chongqing Jiaotong University Chongqing 400074 China;

    Department of Computer Science and Technology China University of Petroleum-Beijing Beijing 102249 China Beijing Key Laboratory of Petroleum Data Mining China University of Petroleum-Beijing Beijing 102249 China;

    College of Computer Science and Technology Shandong University of Technology Zibo 255300 China;

    Grab Company Singapore 573972 Singapore;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Data entity discovery; MapReduce; Data-intensive workflow; Cloud computing;

    机译:数据实体发现;mapreduce;数据密集型工作流程;云计算;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号