TTLoC: Taming Tail Latency for Erasure-Coded Cloud Storage Systems

首页> 外文期刊>IEEE transactions on network and service management >TTLoC: Taming Tail Latency for Erasure-Coded Cloud Storage Systems

【24h】

TTLoC: Taming Tail Latency for Erasure-Coded Cloud Storage Systems

机译：TTLoC：消除纠删编码的云存储系统的尾部延迟

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Distributed storage systems are known to be susceptible to long tails in response time. In modern online storage systems such as Bing, Facebook, and Amazon, the long tails of the service latency are of particular concern, with 99.9th percentile response times being orders of magnitude worse than the mean. As erasure codes emerge as a popular technique to achieve high data reliability in distributed storage while attaining space efficiency, taming tail latency still remains an open problem due to the lack of mathematical models for analyzing such systems. To this end, we propose a framework for quantifying and optimizing tail latency in erasure-coded storage systems. In particular, we derive upper bounds on tail latency in closed-form for arbitrary service time distribution and heterogeneous files. Based on the model, we formulate an optimization problem to jointly minimize weighted latency tail probability of all files over the placement of files on the servers, and the choice of servers to access the requested files. The non-convex problem is solved using an efficient, alternating optimization algorithm. Further, we mathematically quantify, in closed form, the tail index, i.e., the exponent at which latency tail probability diminishes to zero, of the service latency for arbitrary erasure-coded storage, by characterizing the asymptotic behavior of latency distribution tails. We further show that probabilistic scheduling-based algorithms are (asymptotically) optimal since they are able to achieve the exact tail index. Evaluation results show significant reduction of tail latency for erasure-coded storage systems with realistic workload. Based on the offline algorithm, an online version is developed and its superiority over the state-of-the-art algorithms, e.g., join-shortest-queue (JSQ), power-of-d [Pof(d))], least-load [LL(d)], is shown. Finally, a cloud storage system is implemented in a real cloud environment to show the superiority of our approach as compared to the considered baselines.

机译：已知分布式存储系统的响应时间容易长尾巴。在诸如Bing，Facebook和Amazon之类的现代在线存储系统中，服务延迟的长尾巴尤为令人关注，其99.9％的响应时间比平均值差了几个数量级。随着擦除代码成为在分布式存储中实现高数据可靠性同时获得空间效率的流行技术，由于缺乏用于分析此类系统的数学模型，驯服尾部等待时间仍然是一个未解决的问题。为此，我们提出了一种用于量化和优化擦除编码存储系统中尾部等待时间的框架。特别是，对于任意服务时间分配和异构文件，我们以封闭形式导出尾部等待时间的上限。基于该模型，我们制定了一个优化问题，以使所有文件在服务器上的文件放置上的所有文件的加权延迟拖尾概率以及在服务器上访问所需文件的选择最小化。使用高效的交替优化算法可以解决非凸问题。此外，我们通过刻画潜伏期分布尾部的渐近行为，以封闭形式在数学上量化尾部索引，即潜伏期尾部概率减小到零的任意擦除编码存储的服务潜伏期的指数。我们进一步证明，基于概率调度的算法（渐近地）是最优的，因为它们能够实现精确的尾部索引。评估结果表明，具有实际工作负载的擦除编码存储系统的尾部等待时间显着减少。基于离线算法，开发了一种在线版本，它比最新算法（例如，连接最短队列（JSQ），d的幂[Pof（d））]，最小显示了负载[LL（d）]。最后，在真实的云环境中实现了云存储系统，以展示我们的方法相比于所考虑的基准的优越性。

著录项

来源
《IEEE transactions on network and service management》 |2019年第4期|1609-1623|共15页
作者

展开▼
作者单位

Purdue Univ Sch Ind Engn W Lafayette IN 47907 USA;

Purdue Univ Sch Ind Engn W Lafayette IN 47907 USA|Purdue Univ Sch Elect & Comp Engn W Lafayette IN 47907 USA;

George Washington Univ Dept Elect & Comp Engn Washington DC 20052 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Optimization; Servers; Probabilistic logic; Cloud computing; Indexes; Encoding; Queueing analysis; Tail latency; erasure coding; distributed storage systems; bi-partite matching; alternating optimization; laplace Stieltjes transform;

机译：优化;服务器;概率逻辑;云计算;索引;编码;排队分析;尾部延迟擦除编码;分布式存储系统;双向匹配;交替优化;拉普拉斯Stieltjes变换;

相似文献

外文文献
中文文献
专利

1. TTLCache: Taming Latency in Erasure-Coded Storage Through TTL Caching [J] . Al-Abbasi Abubakr O., Aggarwal Vaneet IEEE transactions on network and service management . 2020,第3期

机译：TTLCache：通过TTL缓存拒绝擦除编码存储中的延迟
2. Modeling and Optimization of Latency in Erasure-coded Storage Systems [J] . Vaneet Aggarwal, Tian Lan Foundations and trends in communications and information theory . 2021,第3期

机译：擦除编码存储系统延迟的建模与优化
3. Understanding the latency distribution of cloud object storage systems [J] . Su Yi, Feng Dan, Hua Yu, Journal of Parallel and Distributed Computing . 2019,第JUNa期

机译：了解云对象存储系统的延迟分布
4. Taming Tail Latency for Erasure-coded, Distributed Storage Systems [C] . Vaneet Aggarwal, Jingxian Fan, Tian Lan IEEE International Conference on Computer Communications . 2017

机译：拒绝擦除编码，分布式存储系统的尾部延迟
5. Enhancing Performance of Cloud Computing Services Through Improving Reliability and Taming Latency [D] . Xiang, Yu 2015

机译：通过提高可靠性和缓解延迟来提高云计算服务的性能
6. A Systematic Review on Cloud Storage Mechanisms Concerning e-Healthcare Systems [O] . Adnan Tahir, Fei Chen, Habib Ullah Khan, 2020

机译：关于电子医疗保健系统的云存储机制系统综述
7. On the Latency and Energy Efficiency of Erasure-Coded Cloud Storage Systems [O] . Kumar, Akshay, Tandon, Ravi, Clancy, T. Charles 2015

机译：浅析删除编码云存储的延迟和能量效率系统

TTLoC: Taming Tail Latency for Erasure-Coded Cloud Storage Systems

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅