首页> 外文学位 >Resource management for data streaming applications.
【24h】

Resource management for data streaming applications.

机译:数据流应用程序的资源管理。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation investigates novel middleware mechanisms for building streaming applications. Developing streaming applications is a challenging task because (i) they are continuous in nature; (ii) they require efficient transport of data from/to distributed sources and sinks; (iii) they need access to heterogeneous resources spanning sensor networks and high performance computing; and (iv) they are time critical in nature.;One common characteristics of these applications is data fusion. I present a novel programming abstraction, called DFuse, that makes it easier to develop fusion applications. The application program is specified as a dataflow graph with fusion points. DFuse middleware instantiates the graph on distributed resources and subsumes issues inherent in distributed programming---such as failures, partial fusion, buffer management, and synchronization. Through experiments, I demonstrate that DFuse API implementation has reasonable overhead.;I also address the challenges involved in allocating high performance computing resources for these applications. The scheduling framework consists of a heuristic algorithm, called Streamline, for placement of streaming application dataflow graph on HPC resources. I demonstrate the performance benefits of Streamline in a controlled environment through simulation as well as in wide area environment using Planetlab. Also I demonstrate that the scheduling algorithm can be implemented as a grid service and be deployed in wide area environment.;While Streamline does the placement for such streaming applications well, the application dynamics may result in the computation and communication characteristics of the application changing over time. I present a Distributed Scheduling heuristic and a Periodic Streamline algorithm to address the limitations of Static Streamline algorithm. The performance of Distributed Algorithm is compared with Periodic Streamline and Static Streamline. Through micro measurements, I show that the Distributed Algorithm performs close to Periodic Streamline and 6x better than Static Streamline under dynamic resource availability. Through scalability study, I also show that the Distributed Algorithm performs close (within 5%) to Periodic Streamline algorithm with much less (7.5x less) overhead.;Finally, using a case study of such data streaming and ubiquitous application and the experience gained via building it, we propose a taxonomy of ubiquitous computing stack called UbiqStack. UbiqStack consists of five orthogonal functionalities of most commonly occurring subsystems for ubiquitous applications. Through the lens of the UbiqStack taxonomy, we survey a variety of subsystems designed to be the building blocks from which sophisticated infrastructures for ubiquitous computing can be assembled.;In summary, I develop Fusion Channel programming abstraction that makes it easier for domain experts to build data streaming applications. An application only needs to specify the input and output connections to fusion channels, and the fusion functions. The subsystems developed in this dissertation take care of instantiating an application, allocating resources for the application (via scheduling heuristics) and dynamically managing the resources (via dynamic scheduling). Through performance evaluation, I demonstrate that the resources are allocated efficiently to optimize the throughput and latency constraints of an application. Through extensive micro measurements and scalability studies, I have established my thesis: "An intuitive programming abstraction will make it easier to build dynamic, distributed, and ubiquitous data streaming applications. Moreover, such an abstraction will enable an efficient allocation of shared and heterogeneous computational resources thereby making it easier for domain experts to build these applications."
机译:本文研究了用于构建流应用程序的新型中间件机制。开发流应用程序是一项具有挑战性的任务,因为(i)它们本质上是连续的; (ii)它们要求从/到分布式源和汇的有效数据传输; (iii)他们需要访问跨越传感器网络和高性能计算的异构资源;这些应用程序的一个共同特征是数据融合。我提出了一种新颖的编程抽象,称为DFuse,它使开发融合应用程序变得更加容易。该应用程序被指定为带有融合点的数据流图。 DFuse中间件实例化了分布式资源上的图表,并包含了分布式编程中固有的问题-例如故障,部分融合,缓冲区管理和同步。通过实验,我证明了DFuse API的实现具有合理的开销。我还解决了为这些应用程序分配高性能计算资源所涉及的挑战。调度框架包含一个称为Streamline的启发式算法,用于将流应用程序数据流图放置在HPC资源上。我将通过仿真以及使用Planetlab在广域环境中演示Streamline在受控环境中的性能优势。我还演示了调度算法可以作为网格服务实现,并且可以部署在广域环境中。虽然Streamline可以很好地放置此类流应用程序,但是应用程序动态性可能会导致应用程序的计算和通信特性发生变化时间。为了解决静态流线算法的局限性,我提出了一种分布式调度启发式算法和周期流线算法。将分布式算法的性能与周期性流线和静态流线进行了比较。通过微观测量,我发现在动态资源可用性下,分布式算法的性能接近于周期性流线,并且比静态流线好6倍。通过可伸缩性研究,我还证明了分布式算法的性能与周期流线算法相近(不到5%),而开销却少得多(少了7.5倍)。最后,通过对此类数据流和普适应用的案例研究以及获得的经验通过构建它,我们提出了称为UbiqStack的普适计算堆栈的分类法。 UbiqStack由适用于普遍应用的最常见子系统的五个正交功能组成。通过UbiqStack分类学的视角,我们调查了各种子系统,这些子系统被设计为可组装无处不在的计算的复杂基础架构的基础。;总而言之,我开发了Fusion Channel编程抽象,使领域专家更容易构建数据流应用程序。应用程序仅需要指定到融合通道的输入和输出连接以及融合功能。本文开发的子系统负责实例化应用程序,为应用程序分配资源(通过调度试探法)和动态管理资源(通过动态调度)。通过性能评估,我演示了如何有效分配资源以优化应用程序的吞吐量和延迟约束。通过广泛的微观测量和可伸缩性研究,我建立了自己的论文:“直观的编程抽象将使构建动态,分布式和无处不在的数据流应用程序变得更加容易。此外,这样的抽象将使共享和异构计算的有效分配成为可能。资源,从而使领域专家更轻松地构建这些应用程序。”

著录项

  • 作者

    Agrawalla, Bikash Kumar.;

  • 作者单位

    Georgia Institute of Technology.;

  • 授予单位 Georgia Institute of Technology.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 168 p.
  • 总页数 168
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号