Flexible MapReduce Workflows for Cloud Data Analytics

Carlos Goncalves; Luis Assuncao; Jose C. Cunha

首页> 外文期刊>International journal of grid and high performance computing >Flexible MapReduce Workflows for Cloud Data Analytics

【24h】

Flexible MapReduce Workflows for Cloud Data Analytics

机译：灵活的MapReduce工作流程，用于云数据分析

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data analytics applications handle large data sets subject to multiple processing phases, some of which can execute in parallel on clusters, grids or clouds. Such applications can benefit from using MapReduce model, only requiring the end-user to define the application algorithms for input data processing and the map and reduce functions, but this poses a need to install/configure specific frameworks such as Apache Hadoop or Elastic MapReduce in Amazon Cloud. In order to provide more flexibility in defining and adjusting the application configurations, as well as in the specification of the composition of the application phases and their orchestration, the authors describe an approach for supporting MapReduce stages as sub-workflows in the AWARD framework (Autonomic Workflow Activities Reconfigurable and Dynamic). The authors discuss how a text mining application is represented as a complex workflow with multiple phases, where individual workflow nodes support MapReduce computations. Access to intermediate data produced during the MapReduce computations is supported by a data sharing abstraction. The authors describe two implementations of this abstraction, one based on a shared tuple space and another based on an in-memory distributed key/value store. The authors describe the implementation of the framework, a set of developed tools, and their experimentation with the execution of the text mining algorithm over multiple Amazon EC2 (Elastic Compute Cloud) instances, and report on the speed-up and size-up results obtained up to 20 EC2 instances and for different corpus sizes, up to 97 million words.

机译：数据分析应用程序处理受多个处理阶段约束的大型数据集，其中某些阶段可以在集群，网格或云上并行执行。此类应用程序可受益于使用MapReduce模型，仅要求最终用户定义用于输入数据处理以及map和reduce函数的应用程序算法，但这需要在其中安装/配置特定框架，例如Apache Hadoop或Elastic MapReduce。亚马逊云。为了在定义和调整应用程序配置以及应用程序阶段的组成及其编排规范方面提供更大的灵活性，作者描述了一种在AWARD框架中将MapReduce阶段作为子工作流支持的方法（自动工作流程活动可重新配置和动态）。作者讨论了如何将文本挖掘应用程序表示为具有多个阶段的复杂工作流，其中各个工作流节点都支持MapReduce计算。数据共享抽象支持对在MapReduce计算期间生成的中间数据的访问。作者描述了这种抽象的两种实现，一种基于共享的元组空间，另一种基于内存中的分布式键/值存储。作者描述了该框架的实现，一组开发的工具，以及他们在多个Amazon EC2（弹性计算云）实例上执行文本挖掘算法的实验，并报告了获得的加速和放大结果最多20个EC2实例，并且针对不同的语料库大小，最多9700万个单词。

著录项

来源
《International journal of grid and high performance computing 》 |2013年第4期| 48-64| 共17页
作者
Carlos Goncalves; Luis Assuncao; Jose C. Cunha;
展开▼
作者单位

ISEL - Instituto Superior de Engenharia de Lisboa, Lisbon, Portugal;

ISEL - Instituto Superior de Engenharia de Lisboa, Lisbon, Portugal;

CITI-FCT, Universidade Nova de Lisboa, Lisbon, Portugal;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Cloud; Data Analytics Applications; MapReduce; Text Mining; Workflow;

机译：云;数据分析应用程序;MapReduce;文本挖掘;工作流程;

相似文献

外文文献
中文文献
专利

1. A Research on Big Data Analytics Security and Privacy in Cloud, Data Mining, Hadoop and Mapreduce [J] . Nandhini.P International Journal of Engineering Research and Applications . 2018 ,第4期

机译：云，数据挖掘，Hadoop和Mapreduce中大数据分析安全性和隐私性的研究
2. Layman Analytics System: A Cloud-Enabled System for Data Analytics Workflow Recommendation [J] . Theint Theint Aye, Gary Kee Khoon Lee, Yi Su, Automation Science and Engineering, IEEE Transactions on . 2017 ,第1期

机译：Layman Analytics系统：针对数据分析工作流建议的基于云的系统
3. Decentralized executions of privacy awareness data analytics workflows in the cloud [J] . Yan Yao, Jian Cao, Shiyou Qian, Concurrency and computation: practice and experience . 2019 ,第15期

机译：在云中分散执行隐私意识数据分析工作流
4. Data analytics in the cloud with flexible MapReduce workflows [C] . Goncalves Carlos, Assuncao Luis, Cunha Jose C. 2012 IEEE 4th International Conference on Cloud Computing Technology and Science. . 2012

机译：借助灵活的MapReduce工作流程在云中进行数据分析
5. Improving Hadoop performance by using metadata of related jobs in text datasets via enhancing MapReduce workflow. [D] . Alshammari, Hamoud. 2016

机译：通过增强MapReduce工作流程，在文本数据集中使用相关作业的元数据来提高Hadoop性能。
6. Enabling Big Geoscience Data Analytics with a Cloud-Based MapReduce-Enabled and Service-Oriented Workflow Framework [O] . Zhenlong Li, Chaowei Yang, Baoxuan Jin, -1

机译：通过基于云启用MapReduce且面向服务的工作流框架来实现大地球科学数据分析
7. Data Analytics in the Cloud with Flexible MapReduce Workflows [O] . Carlos Goncalves, Luis Assuncao, Jose C. Cunha 2013

机译：灵活的MapReduce工作流程在云端进行数据分析

Flexible MapReduce Workflows for Cloud Data Analytics

摘要

著录项

相似文献

相关主题

期刊订阅