PFPMine: A parallel approach for discovering interacting data entities in data-intensive cloud workflows

Yuze Huang; Jiwei Huang; Cong Liu; Chengning Zhang

首页> 外文期刊>Future generation computer systems >PFPMine: A parallel approach for discovering interacting data entities in data-intensive cloud workflows

【24h】

PFPMine: A parallel approach for discovering interacting data entities in data-intensive cloud workflows

机译：pfpmine：一种发现数据密集型云工作流程中的交互数据实体的并行方法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

With the evolution of cloud computing, communities and companies deployed their workflows on cloud to support end-to-end business processes that are usually syndicated with other external services. To improve the efficiency of the system as well as reducing energy consumption, data placement and backup strategies should be carefully designed. One of the most challenging problems is the discovery of interacting data entities in date-intensive workflows. To tackle this challenge, this paper presents a frequent pattern-based approach named FPMine for interacting data entity discovery in cloud workflows. A direct discriminative mining algorithm is first proposed to determine the minimum support threshold, based on which FP-tree is constructed to formulate the frequent item pairs. Next, FP-matrix is applied to avoid traversing the FP-trees during data entity discovery, and a pruning approach is introduced to reduce the redundancy of frequent item pairs. Furthermore, we propose a parallel data entity mining algorithm using MapReduce framework, namely PFPMine, and then design a primitive data placement and backup strategy. Finally, we evaluate the efficiency of our approach by experiments using real-life data, based on which we show that our approach can facilitate the discovery of interacting data entities with efficiency for cloud workflows. Comparing with traditional FP-growth approach, we pay only a multiplicative factor for making our approach able to extract fine-grained frequent item pairs rather than frequent patterns, which can bring significant advantages to data placement. After parallelization, the PFPMine algorithm performs better with high efficiency for both sparse datasets and dense datasets than FP-growth. The results show that PFPMine can reduce the running time by at least 25%, and preforms with significantly higher efficiency than FP-growth approach.

机译：随着云计算的演变，社区和公司在云上部署了他们的工作流，以支持通常与其他外部服务结合的端到端业务流程。为了提高系统的效率以及降低能耗，应仔细设计数据放置和备份策略。最具挑战性的问题之一是在日期密集型工作流程中发现数据实体。为了解决这一挑战，本文提出了一种频繁的基于模式的方法，命名为FPMine，用于在云工作流中交互数据实体发现。首先提出直接判别挖掘算法以确定基于哪个FP树的最小支持阈值以制定频繁的项目对。接下来，应用FP-矩阵以避免在数据实体发现期间遍历FP树，并且引入了修剪方法以减少频繁项目对的冗余。此外，我们提出了一种使用MapReduce框架的并行数据实体挖掘算法，即PFPMINE，然后设计了一个原始数据放置和备份策略。最后，我们通过使用现实生活数据的实验评估我们的方法的效率，我们表明我们的方法可以促进以云工作流程的效率为数据实体进行互动。与传统的FP-生长方法相比，我们只支付乘法因素，使我们的方法能够提取细粒度频繁的项目对而不是频繁的模式，这可以带来显着的数据展示优势。在并行化之后，PFPMINE算法对于稀疏数据集和密集数据集的高效率更好地执行比FP-Grows。结果表明，PFPMINE可以将运行时间减少至少25％，并且效率明显高于FP-生长方法。

著录项

来源
《Future generation computer systems》 |2020年第12期|474-487|共14页
作者
Yuze Huang; Jiwei Huang; Cong Liu; Chengning Zhang;
展开▼
作者单位

School of Information Science and Engineering Chongqing Jiaotong University Chongqing 400074 China;

Department of Computer Science and Technology China University of Petroleum-Beijing Beijing 102249 China Beijing Key Laboratory of Petroleum Data Mining China University of Petroleum-Beijing Beijing 102249 China;

College of Computer Science and Technology Shandong University of Technology Zibo 255300 China;

Grab Company Singapore 573972 Singapore;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Data entity discovery; MapReduce; Data-intensive workflow; Cloud computing;

机译：数据实体发现;mapreduce;数据密集型工作流程;云计算;

相似文献

外文文献
中文文献
专利

1. Data-intensive workflow management: for clouds and data-intensive and scalable computing environments [J] . Balint Molnar Computing reviews . 2021,第1期

机译：数据密集型工作流管理：用于云和数据密集型和可伸缩的计算环境
2. Data-intensive workflow management: for clouds and data-intensive and scalable computing environments [J] . Balint Molnar Computing reviews . 2021,第1期

机译：数据密集型工作流管理：用于云和数据密集型和可伸缩的计算环境
3. WaaS: Workflow-as-a-Service for the Cloud with Scheduling of Continuous and Data-Intensive Workflows [J] . Sergio Esteves, Luis Veiga The Computer journal . 2016,第3期

机译：WaaS：安排连续和数据密集型工作流的云工作流即服务
4. Accelerating Data-Intensive Applications: a Cloud Computing Approach to Parallel Image Pattern Recognition Tasks [C] . Liangxiu Han, Tantana Saengngam, Jano van Hemert International Conference on Advanced Engineering Computing and Applications in Sciences . 2010

机译：加速数据密集型应用：并行图像模式识别任务的云计算方法
5. A data-intensive approach to named entity recognition using domain and language independent methods [D] . Osesina, Olukayode Isaac. 2010

机译：使用领域和语言无关的方法进行的数据密集型命名实体识别方法
6. Hybrid Clouds for Data-Intensive 5G-Enabled IoT Applications: An Overview Key Issues and Relevant Architecture [O] . Panagiotis Trakadas, Nikolaos Nomikos, Emmanouel T. Michailidis, 2019

机译：适用于数据密集型启用5G的IoT应用的混合云：概述关键问题和相关架构
7. Raw data queries during data-intensive parallel workflow execution [O] . Silva, Vítor,, Leite, José,, Camata, José,, 2017

机译：数据密集型并行工作流执行期间的原始数据查询

PFPMine: A parallel approach for discovering interacting data entities in data-intensive cloud workflows

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅