首页> 外文会议>IEEE International Congress on Big Data >A NoSQL Data Model for Scalable Big Data Workflow Execution
【24h】

A NoSQL Data Model for Scalable Big Data Workflow Execution

机译:用于可伸缩大数据工作流执行的NoSQL数据模型

获取原文

摘要

While big data workflows haven been proposed recently as the next-generation data-centric workflow paradigm to process and analyze data of ever increasing in scale, complexity, and rate of acquisition, a scalable distributed data model is still missing that abstracts and automates data distribution, parallelism, and scalable processing. In the meanwhile, although NoSQL has emerged as a new category of data models, they are optimized for storing and querying of large datasets, not for ad-hoc data analysis where data placement and data movement are necessary for optimized workflow execution. In this paper, we propose a NoSQL data model that: 1) supports high-performance MapReduce-style workflows that automate data partitioning and data-parallelism execution. In contrast to the traditional MapReduce framework, our MapReduce-style workflows are fully composable with other workflows enabling dataflow applications with a richer structure, 2) automates virtual machine provisioning and deprovisioning on demand according to the sizes of input datasets, 3) enables a flexible framework for workflow executors that take advantage of the proposed NoSQL data model to improve the performance of workflow execution. Our case studies and experiments show the competitive advantages of our proposed data model. The proposed NoSQL data model is implemented in a new release of DATAVIEW, one of the most usable big data workflow systems in the community.
机译:尽管最近已经提出以大数据工作流作为下一代以数据为中心的工作流范式,以处理和分析规模,复杂性和获取率不断增长的数据,但仍缺少可扩展的分布式数据模型,该模型抽象化并自动化了数据分配,并行性和可扩展处理。同时,尽管NoSQL已经成为一种新的数据模型类别,但它们已针对存储和查询大型数据集进行了优化,而不是针对临时数据分析进行了优化,在临时数据分析中,数据放置和数据移动对于优化工作流执行是必不可少的。在本文中,我们提出了一种NoSQL数据模型:1)支持可自动执行数据分区和数据并行执行的高性能MapReduce样式工作流。与传统的MapReduce框架相比,我们的MapReduce风格的工作流程可与其他工作流程完全组合,从而使数据流应用程序具有更丰富的结构; 2)根据输入数据集的大小自动按需自动进行虚拟机置备和取消置备; 3)灵活用于工作流执行者的框架,该框架利用建议的NoSQL数据模型来提高工作流执行的性能。我们的案例研究和实验表明了我们提出的数据模型的竞争优势。提议的NoSQL数据模型在DATAVIEW的新版本中实现,DATAVIEW是社区中最可用的大数据工作流系统之一。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号