首页> 美国卫生研究院文献>other >BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark

【2h】

BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark

机译：BigDebug：用于Spark中交互式大数据处理的调试原语

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Developers use cloud computing platforms to process a large quantity of data in parallel when developing big data analytics. Debugging the massive parallel computations that run in today’s data-centers is time consuming and error-prone. To address this challenge, we design a set of interactive, real-time debugging primitives for big data processing in Apache Spark, the next generation data-intensive scalable cloud computing platform. This requires re-thinking the notion of step-through debugging in a traditional debugger such as gdb, because pausing the entire computation across distributed worker nodes causes significant delay and naively inspecting millions of records using a watchpoint is too time consuming for an end user.First, BIGDEBUG’s simulated breakpoints and on-demand watchpoints allow users to selectively examine distributed, intermediate data on the cloud with little overhead. Second, a user can also pinpoint a crash-inducing record and selectively resume relevant sub-computations after a quick fix. Third, a user can determine the root causes of errors (or delays) at the level of individual records through a fine-grained data provenance capability. Our evaluation shows that BIGDEBUG scales to terabytes and its record-level tracing incurs less than 25% overhead on average. It determines crash culprits orders of magnitude more accurately and provides up to 100% time saving compared to the baseline replay debugger. The results show that BIGDEBUG supports debugging at interactive speeds with minimal performance impact.

机译：开发大数据分析时，开发人员使用云计算平台并行处理大量数据。调试当今数据中心中运行的大量并行计算非常耗时且容易出错。为了应对这一挑战，我们为下一代数据密集型可扩展云计算平台Apache Spark中的大数据处理设计了一套交互式实时调试原语。这需要重新考虑传统调试器（例如gdb ）中的逐步调试的概念，因为在分布式工作节点上暂停整个计算会导致严重的延迟，并且也无法使用观察点天真地检查数百万条记录首先，BIGDEBUG的模拟断点和按需观察点使用户能够以很少的开销有选择地检查云上的分布式中间数据。其次，用户还可以精确定位导致崩溃的记录，并在快速修复后有选择地恢复相关的子计算。第三，用户可以通过细粒度的数据来源功能确定单个记录级别的错误（或延迟）的根本原因。我们的评估表明，BIGDEBUG可扩展至TB，其记录级跟踪平均不到25％的开销。与基线重播调试器相比，它可以更准确地确定崩溃原因的数量级，并节省多达100％的时间。结果表明，BIGDEBUG支持以交互速度进行调试，而对性能的影响最小。

著录项

期刊名称 other
作者
Muhammad Ali Gulzar; Matteo Interlandi; Seunghyun Yoo; Sai Deep Tetali; Tyson Condie; Todd Millstein; Miryung Kim;
展开▼
作者单位

展开▼
年(卷),期 -1(2016),-1
年度 -1
页码 784–795
总页数 34
原文格式 PDF
正文语种
中图分类
关键词
Debugging big data analytics interactive tools data-intensive scalable computing (DISC) fault localization and recovery;

机译：调试;大数据分析;交互式工具;数据密集型可扩展计算（DISC）;故障本地化和恢复;
入库时间 2022-08-21 11:12:39

相似文献

外文文献
中文文献
专利

1. Use of Process Data Obtained from a Data Acquisition System for Optimizing and Debugging Extrusion Processes [J] . KUN S. HYUN, MARK A. SPALDING Advances in Polymer Technology . 1996,第1期

机译：从数据采集系统获得的过程数据用于优化和调试挤出过程
2. Researchers Submit Patent Application, "Data Processing Apparatus and Related Methods of Debugging Processing Circuitry", for Approval [J] . Journal of Engineering . 2013,第12期

机译：研究人员提交了专利申请“数据处理设备和调试处理电路的相关方法”以供批准
3. Modeling Stand for Debugging and Testing of a Complex Information Processing System with Data from Many Different-Type Sources [J] . V. I. Gouz, V. P. Lipatov, T. V. Baringolts, Radioelectronics and Communications Systems . 2012,第1期

机译：使用来自许多不同类型源的数据调试和测试复杂的信息处理系统的模型台
4. BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark [C] . Muhammad Ali Gulzar, Matteo Interlandi, Seunghyun Yoo, 2016 IEEE/ACM 38th IEEE International Conference on Software Engineering . 2016

机译：BigDebug：用于Spark中交互式大数据处理的调试原语
5. Streamlining Big Data Processing Pipelines via Unix Memory Tools, Persistent Spark Datasets, and the Apache Ignite Inmemory File System [D] . Blair, Walter 2018

机译：通过Unix内存工具，持久性Spark数据集和Apache Ignite内存文件系统简化大数据处理管道
6. Big Data Approaches for the Analysis of Large-Scale fMRI Data Using Apache Spark and GPU Processing: A Demonstration on Resting-State fMRI Data from the Human Connectome Project [O] . Roland N. Boubela, Klaudius Kalcher, Wolfgang Huf, 2015

机译：使用Apache Spark和GPU处理的大数据分析方法用于大规模fMRI数据：来自人类Connectome项目的静态fMRI数据的演示
7. Big Data approaches for the analysis of large-scale fMRI data using Apache Spark and GPU processing: A demonstration on resting-state fMRI data from the Human Connectome Project [O] . Roland N Boubela, Klaudius eKalcher, Wolfgang eHuf, 2016

机译：使用apache spark和GpU处理分析大规模fmRI数据的大数据方法：来自Human Connectome项目的静态状态fmRI数据演示

BigDebug: Debugging Primitives for Interactive Big Data Processing in Spark

摘要

著录项

相似文献

相关主题

期刊订阅