首页> 外文会议>ACM SIGMOD international conference on Management of data >Process and dataflow control in distributed data-intensive systems
【24h】

Process and dataflow control in distributed data-intensive systems

机译:分布式数据密集型系统中的过程和数据流控制

获取原文

摘要

In dataflow architectures, each dataflow operation is typically executed on a single physical node. We are concerned with distributed data-intensive systems, in which each base (i.e., persistent) set of data has been declustered over many physical nodes to achieve load balancing. Because of large base set size, each operation is executed where the base set resides, and intermediate results are transferred between physical nodes. In such systems, each dataflow operation is typically executed on many physical nodes. Furthermore, because computations are data-dependent, we cannot know until run time which subset of the physical nodes containing a particular base set will be involved in a given dataflow operation. This uncertainty creates several problems.

We examine the problems of efficient program loading, dataflow-operation activation and termination, control of data transfer among dataflow operations, and transaction commit and abort in a distributed data-intensive system. We show how these problems are interrelated, and we present a unified set of mechanisms for efficiently solving them. For some of the problems, we present several solutions and compare them quantitatively.

机译:

在数据流体系结构中,每个数据流操作通常在单个物理节点上执行。我们关注分布式数据密集型系统,在该系统中,每个基础(即持久性)数据集已在许多物理节点上分簇以实现负载平衡。由于基集很大,因此在基集所在的位置执行每个操作,并且在物理节点之间传输中间结果。在这样的系统中,每个数据流操作通常在许多物理节点上执行。此外,由于计算是依赖于数据的,因此直到运行时,我们才能知道给定数据流操作中将包含物理节点的哪个子集(包含特定基本集)。这种不确定性会带来一些问题

我们研究了分布式数据密集型系统中高效程序加载,数据流操作激活和终止,数据流操作之间的数据传输控制以及事务提交和中止的问题。我们展示了这些问题是如何相互关联的,并提出了一套有效解决这些问题的统一机制。对于某些问题,我们提出了几种解决方案,并进行了定量比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号