Stocator: Providing High Performance and Fault Tolerance for Apache Spark Over Object Storage

机译：Stocator：通过对象存储为Apache Spark提供高性能和容错能力

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Until now object storage has not been a first-class citizen of the Apache Hadoop ecosystem including Apache Spark. Hadoop connectors to object storage have been based on file semantics, an impedance mismatch, which leads to low performance and the need for an additional consistent storage system to achieve fault tolerance. In particular, Hadoop depends on its underlying storage system and its associated connector for fault tolerance and allowing speculative execution. However, these characteristics are obtained through file operations that are not native for object storage, and are both costly and not atomic. As a result these connectors are not efficient and more importantly they cannot help with fault tolerance for object storage. We introduce Stocator, whose novel algorithm achieves both high performance and fault tolerance by taking advantage of object storage semantics. This greatly decreases the number of operations on object storage as well as enabling a much simpler approach to dealing with the eventually consistent semantics typical of object storage. We have implemented Stocator and shared it in open source. Performance testing with Apache Spark shows that it can be 18 times faster for write intensive workloads and can perform 30 times fewer operations on object storage than the legacy Hadoop connectors, reducing costs both for the client and the object storage service provider.

机译：到目前为止，对象存储还不是包括Apache Spark在内的Apache Hadoop生态系统的一等公民。到对象存储的Hadoop连接器已经基于文件语义，阻抗不匹配而导致性能低下，并且需要附加的一致存储系统来实现容错能力。尤其是，Hadoop依赖于其底层存储系统及其关联的连接器来实现容错并允许推测性执行。但是，这些特征是通过文件操作获得的，这些文件操作不是对象存储所固有的，既昂贵又不是原子的。结果，这些连接器效率不高，更重要的是，它们不能帮助对象存储容错。我们介绍了Stocator，其新颖的算法通过利用对象存储语义同时实现了高性能和容错能力。这大大减少了对象存储上的操作数量，并且启用了一种更简单的方法来处理对象存储中典型的最终一致语义。我们已经实现了Stocator并在开源中共享了它。使用Apache Spark进行的性能测试表明，与写入密集型工作负载相比，它可以快18倍，并且在对象存储上执行的操作要比传统Hadoop连接器少30倍，从而降低了客户端和对象存储服务提供商的成本。

著录项

来源
《IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing》|2018年|462-471|共10页
会议地点
作者
Gil Vernik; Michael Factor; Elliot K. Kolodner; Pietro Michiardi; Effi Ofer; Francesco Pace;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Task analysis; Connectors; Sparks; Containers; Semantics; Fault tolerance; Fault tolerant systems;

机译：任务分析;连接器;火花;容器;语义;容错;容错系统;

相似文献

外文文献
中文文献
专利

1. A fast access big data approach for configurable and scalable object storage Enabling mixed fault-tolerance [J] . Valêncio Carlos Roberto, Caetano André Francisco Morielo, Colombini Angelo Cesar, Journal of computer sciences . 2017,第6期

机译：快速访问大数据方法，用于可配置和可扩展的对象存储，支持混合容错
2. A Fast Access Big Data Approach for Configurable and Scalable Object Storage Enabling Mixed Fault-Tolerance [J] . Valecirc, ncio Carlos Roberto, Caetano Andreacute, Journal of computer sciences . 2017,第6期

机译：支持混合容错的可配置和可扩展对象存储的快速访问大数据方法
3. Performance tuning policies for application level fault tolerance in distributed object systems [J] . Theodoros Soldatos, Nantia Iakovidou Journal of Computational Methods in Sciences and Engineering . 2006,第5a6S2期

机译：分布式对象系统中应用程序级容错的性能调整策略
4. Stocator: Providing High Performance and Fault Tolerance for Apache Spark Over Object Storage [C] . Gil Vernik, Michael Factor, Elliot K. Kolodner, IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing . 2018

机译：Stocator：为Apache Spark提供高性能和容错于对象存储
5. Distributed speculations: Providing fault-tolerance and improving performance. [D] . Tapus, Cristian. 2006

机译：分布式推测：提供容错能力并提高性能。
6. SPARK-MSNA: Efficient algorithm on Apache Spark for aligning multiple similar DNA/RNA sequences with supervised learning [O] . V. Vineetha, C. L. Biji, Achuthsankar S. Nair -1

机译：SPARK-MSNA：Apache Spark上的高效算法可通过监督学习将多个相似的DNA / RNA序列比对
7. Distributed speculations: providing fault-tolerance and improving performance [O] . Tapus Cristian 2006

机译：分布式推测：提供容错并提高性能

Stocator: Providing High Performance and Fault Tolerance for Apache Spark Over Object Storage

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅