Reducing Tail Latencies while Improving Resiliency to Timing Errors for Stream Processing Workloads

机译：减少尾部延迟，同时提高对流处理工作量的定时错误的适应性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Stream processing is an increasingly popular model for online data processing that can be partitioned into streams of elements. It is commonly used in real-time data analytics services, such as processing Twitter tweets and Internet of Things (IoT) device feeds. Current stream processing frameworks boast high throughput and low average latency. However, users of these frameworks may desire lower tail latencies and better real-time performance for their applications. In practice, there are a number of errors that can affect the performance of stream processing applications, such as garbage collection and resource contention. For some applications, these errors may cause unacceptable violations of real-time constraints. In this paper we propose applying redundancy in the data processing pipeline to increase the resiliency of stream processing applications to timing errors. This results in better real-time performance and a reduction in tail latency. We present a methodology and apply this redundancy in a framework based on Twitter's Heron. Finally, we evaluate the effectiveness of this technique against a range of injected timing errors using benchmarks from Intel's Storm Benchmark. Our results show that redundant tuple processing can effectively reduce the tail latency, and that the number of missed deadlines can also be reduced by up to 94% in the best case. We also study the potential effects of duplication when applied at different stages in the topology. For the topologies in this paper, we further observe that duplication is most effective when computation is redundant at the first bolt. Finally, we evaluate the additional overhead that duplicating tuples brings to a stream processing topology. Our results also show that computation overhead scales slower than communication, and that the real-time performance is improved in spite of the overheads. Overall we conclude that redundancy through duplicated tuples is indeed a powerful tool for increasing the resiliency to intermittent runtime timing errors.

机译：流处理是一种越来越流行的在线数据处理模型，可以将其划分为元素流。它通常用于实时数据分析服务，例如处理Twitter推文和物联网（IoT）设备提要。当前的流处理框架拥有高吞吐量和低平均延迟。但是，这些框架的用户可能希望其应用程序具有更低的尾部等待时间和更好的实时性能。实际上，存在许多可能影响流处理应用程序性能的错误，例如垃圾回收和资源争用。对于某些应用程序，这些错误可能会导致无法接受的违反实时约束的情况。在本文中，我们建议在数据处理管道中应用冗余，以提高流处理应用程序对定时错误的弹性。这将导致更好的实时性能并减少尾部等待时间。我们提出一种方法，并在基于Twitter的Heron的框架中应用此冗余。最后，我们使用Intel风暴基准测试中的基准，针对一系列注入的时序误差评估了该技术的有效性。我们的结果表明，冗余的元组处理可以有效地减少尾部等待时间，并且在最佳情况下错过的最后期限数也可以减少多达94％。我们还研究了在拓扑的不同阶段应用复制时的潜在影响。对于本文中的拓扑，我们进一步观察到，当计算在第一个螺栓处冗余时，复制最为有效。最后，我们评估了复制元组给流处理拓扑带来的额外开销。我们的结果还表明，计算开销的扩展比通信慢，并且尽管有开销，但实时性能仍得到了改善。总的来说，我们得出结论，通过重复的元组实现冗余确实是一种功能强大的工具，可以提高对间歇性运行时计时错误的适应性。

著录项

来源
《IEEE/ACM International Conference on Utility and Cloud Computing》|2018年|194-203|共10页
会议地点
作者
Geoffrey Phi Tran; John Paul Walters; Stephen Crago;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Redundancy; Storms; Timing; Topology; Fault tolerant systems; Real-time systems;

机译：冗余;风暴;定时;拓扑;容错系统;实时系统;

相似文献

外文文献
中文文献
专利

1. ePlex LIS Functionality: Reduce Workload, Errors and Eliminate Non-Value Added Data Processing [J] . Hall S., Tieman B., Thornberg A. The Journal of molecular diagnostics: JMD . 2017,第6期

机译：EPLELL LIS功能：减少工作量，错误和消除非增值数据处理
2. Process/Voltage/Temperature-Variation-Aware Design and Comparative Study of Transition-Detector-Based Error-Detecting Latches for Timing-Error-Resilient Pipelined Systems [J] . Jinn-Shyan Wang, Shih-Nung Wei IEEE transactions on very large scale integration (VLSI) systems . 2017,第10期

机译：过程/电压/温度变化感知设计以及基于时序检测器弹性流水线系统的基于过渡检测器的错误检测锁存器的比较研究
3. SGERC: a self-gated timing error resilient cluster of sequential cells for wide-voltage processor [J] . Taotao Zhu, Xiaoyan Xiang, Chen Chen, IEICE Electronics Express . 2017,第8期

机译：SGERC：用于宽电压处理器的时序单元的自我门控时序误差弹性集群
4. Reducing Tail Latencies While Improving Resiliency to Timing Errors for Stream Processing Workloads [C] . Geoffrey Phi C. Tran, John Paul Walters, Stephen P. Crago IEEE International Conference on Services Computing . 2018

机译：减少尾部延迟，同时提高针对流处理工作量的时序错误的适应性
5. Enhancing modern microprocessor resiliency through workload-cognizant, cross-layer, error impact analysis. [D] . Maniatakos, Michail. 2012

机译：通过工作负载识别，跨层，错误影响分析来增强现代微处理器的弹性。
6. Internal Experts Collaborate to Reduce Critical Hypoglycemia and Insulin Errors and Improve Insulin Administration Timing [O] . Kelley Szelc, Linda Nicolaus 2018

机译：内部专家合作以减少严重的低血糖症和胰岛素错误并改善胰岛素给药时机
7. SGERC: a self-gated timing error resilient cluster of sequential cells for wide-voltage processor [O] . Taotao Zhu, Xiaoyan Xiang, Chen Chen, 2017

机译：SGERC：用于宽压处理器的顺序单元的自动定时误差弹性集群
8. Fault and error latency under real workload: An experimental study [R] . Chillarege, R. 1986

机译：实际工作负载下的故障和错误延迟：一项实验研究

Reducing Tail Latencies while Improving Resiliency to Timing Errors for Stream Processing Workloads

摘要

著录项

相似文献

相关主题

期刊订阅