首页> 外文学位 >Cluster-based storage systems with high scalability.
【24h】

Cluster-based storage systems with high scalability.

机译:具有高可伸缩性的基于集群的存储系统。

获取原文
获取原文并翻译 | 示例

摘要

In recent years, high-end computing has undergone two significant changes: (1) an increasing focus on data-intensive applications, such as data mining, computational biology, and high energy physics, and (2) a paradigm shift from tightly coupled high-end proprietary computing systems to a loosely coupled cost-effective platform that consists of networked commodity machines, also known as clusters. Thus a reliable and scalable storage infrastructure in clusters becomes increasingly crucial for high-end computing. This dissertation investigates the effectiveness of utilizing the existing disks to build a cluster-based storage system and addresses the key problems that limit the scalability of such cluster-based storage systems from four different levels: the block data level, the metadata level, the file data level, and the application level.; At the block data level, this dissertation proposes a novel and simple replacement scheme, called RACE, which differentiates the locality of I/O streams by actively detecting access patterns inherently exhibited in two correlated spaces: the discrete block space of program contexts from which I/O requests are issued and the continuous block space within files to which I/O requests are addressed. RACE is shown to significantly outperform LRU and all other state-of-the-art cache management schemes studied in this dissertation, in terms of hit ratios. At the metadata level, this dissertation exploits the temporal locality of metadata accesses to improve metadata access performance by designing a Hierarchical Bloom filter Array (HBA) scheme that decentralizes the metadata management. Our implementation indicates that HBA with 16 metadata servers can reduce the metadata operation time of a single-metadata-server architecture by a factor up to 43.9. A theoretical model that incorporates the staleness to estimate false rates of Bloom filters is proposed to support adaptive Bloom filter updating. At the file data level, this dissertation proposes to utilize redundant data to optimize the performance for large data accesses by dynamically scheduling I/O requests among data servers to improve I/O performance. At the application level, this work conducts a case study for a popular I/O intensive application, parallel BLAST, and uses this application as a benchmark to evaluate the techniques proposed at the file data level.
机译:近年来,高端计算发生了两个重大变化:(1)越来越关注数据密集型应用,例如数据挖掘,计算生物学和高能物理,以及(2)从紧密耦合的高范式转变高端专有计算系统到一个松散耦合的,具有成本效益的平台,该平台由联网的商用机器(也称为集群)组成。因此,集群中可靠且可扩展的存储基础架构对于高端计算变得越来越重要。本文从四个不同的层面研究了利用现有磁盘构建基于集群的存储系统的有效性,并从四个不同层面解决了限制此类基于集群的存储系统的可伸缩性的关键问题。数据级别和应用程序级别。在块数据级别,本文提出了一种新颖而简单的替换方案,称为RACE,该方案通过主动检测两个相关空间中固有呈现的访问模式来区分I / O流的局部性:程序上下文的离散块空间,发出/ O请求,并在文件中寻址I / O请求的连续块空间。在命中率方面,RACE被证明远远优于LRU和本文研究的所有其他最新的高速缓存管理方案。在元数据级别,本文通过设计分散元数据管理的分层布隆过滤器阵列(HBA)方案,利用元数据访问的时间局部性来改善元数据访问性能。我们的实现表明,具有16个元数据服务器的HBA可以将单元数据服务器体系结构的元数据操作时间减少多达43.9倍。提出了结合陈旧性来估计布隆过滤器错误率的理论模型,以支持自适应布隆过滤器更新。在文件数据级别,本文提出通过在数据服务器之间动态调度I / O请求以提高I / O性能,利用冗余数据来优化大型数据访问的性能。在应用程序级别,这项工作针对流行的I / O密集型应用程序并行BLAST进行了案例研究,并将该应用程序作为基准来评估在文件数据级别建议的技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号