首页> 外文期刊>Software >Tools and strategies for debugging distributed stream processing applications
【24h】

Tools and strategies for debugging distributed stream processing applications

机译:调试分布式流处理应用程序的工具和策略

获取原文
获取原文并翻译 | 示例
           

摘要

Distributed data stream processing applications are often characterized by data flow graphs consisting of a large number of built-in and user-defined operators connected via streams. These flow graphs are typically deployed on a large set of nodes. The data processing is carried out on-the-fly, as tuples arrive at possibly very high rates, with minimum latency. It is well known that developing and debugging distributed, multithreaded, and asynchronous applications, such as stream processing applications, can be challenging. Thus, without domain-specific debugging support, developers struggle when debugging distributed applications. In this paper, we describe tools and language support to support debugging distributed stream processing applications. Our key insight is to view debugging of stream processing applications from four different, but related, perspectives. First, debugging the semantics of the application involves verifying the operator-level composition and inspecting the flows at the logical level. Second, debugging the user-defined operators involves traditional source-code debugging, but strongly tied to the stream-level interactions. Third, debugging the deployment details of the application require understanding the runtime physical layout and configuration of the application. Fourth, debugging the performance of the application requires inspecting various performance metrics (such as communication rates, CPU utilization, etc.) associated with streams, operators, and nodes in the system. In light of this characterization, we developed several tools such as a debugger-aware compiler and an associated stream debugger, composition and deployment visualizers, and performance visualizers, as well as language support, such as configuration knobs for logging and tracing, deployment configurations such as operator-to-process and process-to-node mappings, monitoring directives to inspect streams, and special sink adapters to intercept and dump streaming data to files and sockets, to name a few. We describe these tools in the context of Spade-a language for creating distributed stream processing applications, and System S-a distributed stream processing middleware under development at the IBM Watson Research Center.
机译:分布式数据流处理应用程序通常以数据流图为特征,该数据流图由通过流连接的大量内置和用户定义的运算符组成。这些流程图通常部署在大量节点上。由于元组可能以非常高的速率到达,而延迟最小,因此数据处理是即时进行的。众所周知,开发和调试分布式,多线程和异步应用程序(例如流处理应用程序)可能具有挑战性。因此,如果没有特定于域的调试支持,则开发人员在调试分布式应用程序时会遇到困难。在本文中,我们描述了工具和语言支持,以支持调试分布式流处理应用程序。我们的主要见解是从四个不同但相关的角度查看流处理应用程序的调试。首先,调试应用程序的语义涉及验证操作员级别的组成并在逻辑级别检查流。其次,调试用户定义的运算符涉及传统的源代码调试,但与流级交互紧密相关。第三,调试应用程序的部署细节需要了解应用程序的运行时物理布局和配置。第四,调试应用程序的性能需要检查与系统中的流,运算符和节点相关的各种性能指标(例如,通信速率,CPU利用率等)。根据这种特性,我们开发了多种工具,例如可感知调试器的编译器和关联的流调试器,合成和部署可视化工具,性能可视化工具,以及语言支持,例如用于日志记录和跟踪的配置旋钮,部署配置等。作为操作员到进程和进程到节点的映射,监视指令以检查流,以及特殊的接收器适配器以拦截流数据并将其转储到文件和套接字,仅举几例。我们在Spade(一种用于创建分布式流处理应用程序的语言)和System S(一种在IBM Watson研究中心正在开发的分布式流处理中间件)的上下文中描述这些工具。

著录项

  • 来源
    《Software》 |2009年第16期|1347-1376|共30页
  • 作者单位

    IBM Research, 19 Skyline Dr. Hawthorne, NY 10532, U.S.A.;

    IBM Research, 19 Skyline Dr. Hawthorne, NY 10532, U.S.A.;

    IBM Research, 19 Skyline Dr. Hawthorne, NY 10532, U.S.A.;

    IBM Research, 19 Skyline Dr. Hawthorne, NY 10532, U.S.A.;

    IBM Software Group. 3605 Highway 52 N, Rochester, MN 55901, U.S.A.;

    IBM Software Group. 3605 Highway 52 N, Rochester, MN 55901, U.S.A.;

    IBM Research, 19 Skyline Dr. Hawthorne, NY 10532, U.S.A.;

    IBM Research, 19 Skyline Dr. Hawthorne, NY 10532, U.S.A.;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    debugging tools; distributed stream processing; System S; SPADE;

    机译:调试工具;分布式流处理;系统S;铲;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号