首页> 外文期刊>IEEE transactions on network and service management >TTLoC: Taming Tail Latency for Erasure-Coded Cloud Storage Systems
【24h】

TTLoC: Taming Tail Latency for Erasure-Coded Cloud Storage Systems

机译:TTLoC:消除纠删编码的云存储系统的尾部延迟

获取原文
获取原文并翻译 | 示例

摘要

Distributed storage systems are known to be susceptible to long tails in response time. In modern online storage systems such as Bing, Facebook, and Amazon, the long tails of the service latency are of particular concern, with 99.9th percentile response times being orders of magnitude worse than the mean. As erasure codes emerge as a popular technique to achieve high data reliability in distributed storage while attaining space efficiency, taming tail latency still remains an open problem due to the lack of mathematical models for analyzing such systems. To this end, we propose a framework for quantifying and optimizing tail latency in erasure-coded storage systems. In particular, we derive upper bounds on tail latency in closed-form for arbitrary service time distribution and heterogeneous files. Based on the model, we formulate an optimization problem to jointly minimize weighted latency tail probability of all files over the placement of files on the servers, and the choice of servers to access the requested files. The non-convex problem is solved using an efficient, alternating optimization algorithm. Further, we mathematically quantify, in closed form, the tail index, i.e., the exponent at which latency tail probability diminishes to zero, of the service latency for arbitrary erasure-coded storage, by characterizing the asymptotic behavior of latency distribution tails. We further show that probabilistic scheduling-based algorithms are (asymptotically) optimal since they are able to achieve the exact tail index. Evaluation results show significant reduction of tail latency for erasure-coded storage systems with realistic workload. Based on the offline algorithm, an online version is developed and its superiority over the state-of-the-art algorithms, e.g., join-shortest-queue (JSQ), power-of-d [Pof(d))], least-load [LL(d)], is shown. Finally, a cloud storage system is implemented in a real cloud environment to show the superiority of our approach as compared to the considered baselines.
机译:已知分布式存储系统的响应时间容易长尾巴。在诸如Bing,Facebook和Amazon之类的现代在线存储系统中,服务延迟的长尾巴尤为令人关注,其99.9%的响应时间比平均值差了几个数量级。随着擦除代码成为在分布式存储中实现高数据可靠性同时获得空间效率的流行技术,由于缺乏用于分析此类系统的数学模型,驯服尾部等待时间仍然是一个未解决的问题。为此,我们提出了一种用于量化和优化擦除编码存储系统中尾部等待时间的框架。特别是,对于任意服务时间分配和异构文件,我们以封闭形式导出尾部等待时间的上限。基于该模型,我们制定了一个优化问题,以使所有文件在服务器上的文件放置上的所有文件的加权延迟拖尾概率以及在服务器上访问所需文件的选择最小化。使用高效的交替优化算法可以解决非凸问题。此外,我们通过刻画潜伏期分布尾部的渐近行为,以封闭形式在数学上量化尾部索引,即潜伏期尾部概率减小到零的任意擦除编码存储的服务潜伏期的指数。我们进一步证明,基于概率调度的算法(渐近地)是最优的,因为它们能够实现精确的尾部索引。评估结果表明,具有实际工作负载的擦除编码存储系统的尾部等待时间显着减少。基于离线算法,开发了一种在线版本,它比最新算法(例如,连接最短队列(JSQ),d的幂[Pof(d))],最小显示了负载[LL(d)]。最后,在真实的云环境中实现了云存储系统,以展示我们的方法相比于所考虑的基准的优越性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号