...
首页> 外文期刊>Cluster computing >Nephele streaming: stream processing under QoS constraints at scale
【24h】

Nephele streaming: stream processing under QoS constraints at scale

机译:Nephele流:大规模处理QoS约束下的流处理

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The ability to process large numbers of continuous data streams in a near-real-time fashion has become a crucial prerequisite for many scientific and industrial use cases in recent years. While the individual data streams are usually trivial to process, their aggregated data volumes easily exceed the scalability of traditional stream processing systems. At the same time, massively-parallel data processing systems like MapReduce or Dryad currently enjoy a tremendous popularity for data-intensive applications and have proven to scale to large numbers of nodes. Many of these systems also provide streaming capabilities. However, unlike traditional stream processors, these systems have disregarded QoS requirements of prospective stream processing applications so far. In this paper we address this gap. First, we analyze common design principles of today’s parallel data processing frameworks and identify those principles that provide degrees of freedom in trading off the QoS goals latency and throughput. Second, we propose a highly distributed scheme which allows these frameworks to detect violations of userdefined QoS constraints and optimize the job execution without manual interaction. As a proof of concept, we implemented our approach for our massively-parallel data processing framework Nephele and evaluated its effectiveness through a comparison with Hadoop Online. For an example streaming application from the multimedia domain running on a cluster of 200 nodes, our approach improves the processing latency by a factor of at least 13 while preserving high data throughput when needed.
机译:近年来,以近实时方式处理大量连续数据流的能力已成为许多科学和工业用例的关键先决条件。尽管单个数据流通常不容易处理,但它们的聚合数据量很容易超过传统流处理系统的可伸缩性。同时,大规模并行数据处理系统(如MapReduce或Dryad)目前在数据密集型应用程序中享有很高的知名度,并已证明可以扩展到大量节点。其中许多系统还提供流功能。但是,与传统的流处理器不同,到目前为止,这些系统都忽略了预期的流处理应用程序的QoS要求。在本文中,我们解决了这一差距。首先,我们分析当今并行数据处理框架的通用设计原则,并确定在权衡QoS目标时延和吞吐量方面可以提供自由度的那些原则。其次,我们提出了一种高度分布式的方案,该方案允许这些框架检测到违反用户定义的QoS约束并优化作业执行的情况,而无需手动交互。作为概念验证,我们为大规模并行数据处理框架Nephele实施了我们的方法,并通过与Hadoop Online的比较评估了其有效性。对于在200个节点的群集上运行的来自多媒体域的流应用程序示例,我们的方法将处理延迟提高了至少13倍,同时在需要时保留了高数据吞吐量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号