首页> 外文期刊>Computer architecture news >TPC: Target-Driven Parallelism Combining Prediction and Correction to Reduce Tail Latency in Interactive Services
【24h】

TPC: Target-Driven Parallelism Combining Prediction and Correction to Reduce Tail Latency in Interactive Services

机译:TPC:目标驱动的并行性结合了预测和校正功能,以减少交互式服务中的尾部延迟

获取原文
获取原文并翻译 | 示例

摘要

In interactive services such as web search, recommendations, games and finance, reducing the tail latency is crucial to provide fast response to every user. Using web search as a driving example, we systematically characterize interactive workload to identify the opportunities and challenges for reducing tail latency. We find that the workload consists of mainly short requests that do not benefit from parallelism, and a few long requests which significantly impact the tail but exhibit high parallelism speedup. This motivates estimating request execution time, using a predictor, to identify long requests and to parallelize them. Prediction, however, is not perfect; a long request mispredicted as short is likely to contribute to the server tail latency, setting a ceiling on the achievable tail latency. We propose TPC, an approach that combines prediction information judiciously with dynamic correction for inaccurate prediction. Dynamic correction increases parallelism to accelerate a long request that is mispredicted as short. TPC carefully selects the appropriate target latencies based on system load and parallelism efficiency to reduce tail latency. We implement TPC and several prior approaches to compare them experimentally on a single search server and on a cluster of 40 search servers. The experimental results show that TPC reduces the 99th-and 99.9th-percentile latency by up to 40% compared with the best prior work. Moreover, we evaluate TPC on a finance server, demonstrating its effectiveness on reducing tail latency of interactive services beyond web search.
机译:在网络搜索,推荐,游戏和财务等交互式服务中,减少尾部等待时间对于为每个用户提供快速响应至关重要。以网络搜索为例,我们系统地描述了交互式工作负载,以识别减少机尾延迟的机会和挑战。我们发现工作负载主要由无法从并行性中受益的短请求组成,以及一些对尾部产生重大影响但显示出较高并行性提速的长请求。这会促使使用预测器来估计请求执行时间,以识别长请求并将其并行化。然而,预测并不完美。如果将长请求错误地预测为短请求,则很可能会导致服务器尾部延迟,从而为可达到的尾部延迟设置了上限。我们提出了TPC,一种将预测信息明智地与动态校正相结合的方法,以实现不准确的预测。动态校正会提高并行度,以加速被错误预测为短的长请求。 TPC根据系统负载和并行效率来仔细选择适当的目标延迟,以减少尾部延迟。我们实施TPC和几种先前的方法,以在单个搜索服务器和40个搜索服务器的群集上进行实验比较。实验结果表明,与最佳的现有技术相比,TPC可以将第99%和99.9%的延迟减少多达40%。此外,我们在财务服务器上评估了TPC,证明了其在减少Web搜索之外的交互式服务的尾部延迟方面的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号