首页> 外文期刊>Future generation computer systems >An improved technique for increasing availability in Big Data replication
【24h】

An improved technique for increasing availability in Big Data replication

机译:一种提高大数据复制可用性的改进技术

获取原文
获取原文并翻译 | 示例
           

摘要

Big Data represents a major challenge for the performance of the cloud computing storage systems. Some distributed file systems (DFS) are widely used to store big data, such as Hadoop Distributed File System (HDFS), Google File System (GFS) and others. These DFS replicate and store data as multiple copies to provide availability and reliability, but they increase storage and resources consumption.In a previous work (Kaseb, Khafagy, Ali, & Saad, 2018), we built a Redundant Independent Files (RIF) system over a cloud provider (CP), called CPRIF, which provides HDFS without replica, to improve the overall performance through reducing storage space, resources consumption, operational costs and improved the writing and reading performance. However, RIF suffers from limited availability, limited reliability and increased data recovery time.In this paper, we overcome the limitations of the RIF system by giving more chances to recover a lost block (availability) and the ability of the system to keep working the presence of a lost block (reliability) with less computation (time overhead). As well as keeping the benefits of storage and resources consumption attained by RIF compared to other systems. We call this technique "High Availability Redundant Independent Files" (HARIF), which is built over CP; called CPHARIF.According to the experimental results of the HARIF system using the TeraGen benchmark, it is found that the execution time of recovering data, availability and reliability using HARIF have been improved as compared with RIF. Also, the stored data size and resources consumption with HARIF system is reduced compared to the other systems. The Big Data storage is saved and the data writing and reading are improved. (C) 2018 Elsevier B.V. All rights reserved.
机译:大数据对云计算存储系统的性能提出了重大挑战。一些分布式文件系统(DFS)被广泛用于存储大数据,例如Hadoop分布式文件系统(HDFS),谷歌文件系统(GFS)等。这些DFS将数据复制并存储为多个副本以提供可用性和可靠性,但它们增加了存储和资源消耗。在以前的工作中(Kaseb,Khafagy,Ali和Saad,2018年),我们构建了冗余独立文件(RIF)系统通过称为CPRIF的云提供商(CP)来提供无副本的HDFS,以通过减少存储空间,资源消耗,运营成本以及提高读写性能来提高整体性能。但是,RIF受到可用性,可靠性和数据恢复时间的限制。在本文中,我们通过提供更多机会恢复丢失的块(可用性)以及系统保持工作状态的能力,克服了RIF系统的局限性。丢失块的存在(可靠性),而计算量较少(时间开销)。与其他系统相比,RIF还能保持存储和资源消耗的优势。我们称这种技术为“高可用性冗余独立文件”(HARIF),它是基于CP构建的。根据使用TeraGen基准的HARIF系统的实验结果,发现与HARF相比,使用HARIF恢复数据的执行时间,可用性和可靠性得到了改善。而且,与其他系统相比,使用HARIF系统减少了存储的数据大小和资源消耗。大数据存储得以保存,数据写入和读取得到改善。 (C)2018 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号