首页> 外文会议>IEEE International Conference on Services Computing >Increased Fault-Tolerance and Real-Time Performance Resiliency for Stream Processing Workloads through Redundancy
【24h】

Increased Fault-Tolerance and Real-Time Performance Resiliency for Stream Processing Workloads through Redundancy

机译:通过冗余提高了流处理工作量的容错能力和实时性能弹性

获取原文

摘要

Data analytics and telemetry have become paramount to monitoring and maintaining quality-of-service in addition to business analytics. Stream processing-a model where a network of operators receives and processes continuously arriving discrete elements-is well-suited for these needs. Current and previous studies and frameworks have focused on continuity of operations and aggregate performance metrics. However, real-time performance and tail latency are also important. Timing errors caused by either performance or failed communication faults also affect real-time performance more drastically than aggregate metrics. In this paper, we introduce redundancy in the stream data to improve the real-time performance and resiliency to timing errors caused by either performance or failed communication faults. We also address limitations in previous solutions using a fine-grained acknowledgment tracking scheme to both increase the effectiveness for resiliency to performance faults and enable effectiveness for failed communication faults. Our results show that fine-grained acknowledgment schemes can improve the tail and mean latencies by approximately 30%. We also show that these schemes can improve resiliency to performance faults compared to existing work. Our improvements result in 47.4% to 92.9% fewer missed deadlines compared to 17.3% to 50.6% for comparable topologies and redundancy levels in the state of the art. Finally, we show that redundancies of 25% to 100% can reduce the number of data elements that miss their deadline constraints by 0.76% to 14.04% for applications with high fan-out and by 7.45% up to 50% for applications with no fan-out.
机译:除了业务分析之外,数据分析和遥测已成为监视和维护服务质量的重中之重。流处理(运营商网络接收并处理连续到达的离散元素的模型)非常适合这些需求。当前和以前的研究和框架都集中在运营的连续性和综合绩效指标上。但是,实时性能和尾部延迟也很重要。由性能或通信故障引起的时序错误也比聚合指标更严重地影响实时性能。在本文中,我们在流数据中引入了冗余,以提高实时性能和对由于性能或通信故障导致的定时错误的恢复能力。我们还使用细粒度的确认跟踪方案解决了先前解决方案中的局限性,既提高了对性能故障的弹性的有效性,又使失败的通信故障的有效性成为可能。我们的结果表明,细粒度的确认方案可以将尾部和平均时延提高约30%。我们还表明,与现有工作相比,这些方案可以提高对性能故障的适应性。我们的改进使错过的最后期限减少了47.4%至92.9%,而在现有技术中,可比的拓扑结构和冗余级别的遗漏期限减少了17.3%至50.6%。最后,我们表明,对于高扇出的应用程序,冗余度为25%到100%可以将错过截止期限约束的数据元素的数量减少0.76%至14.04%,对于没有风扇的应用程序则可以减少7.45%高达50% -出去。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号