Cluster-based storage systems with high scalability.

机译：具有高可伸缩性的基于集群的存储系统。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In recent years, high-end computing has undergone two significant changes: (1) an increasing focus on data-intensive applications, such as data mining, computational biology, and high energy physics, and (2) a paradigm shift from tightly coupled high-end proprietary computing systems to a loosely coupled cost-effective platform that consists of networked commodity machines, also known as clusters. Thus a reliable and scalable storage infrastructure in clusters becomes increasingly crucial for high-end computing. This dissertation investigates the effectiveness of utilizing the existing disks to build a cluster-based storage system and addresses the key problems that limit the scalability of such cluster-based storage systems from four different levels: the block data level, the metadata level, the file data level, and the application level.; At the block data level, this dissertation proposes a novel and simple replacement scheme, called RACE, which differentiates the locality of I/O streams by actively detecting access patterns inherently exhibited in two correlated spaces: the discrete block space of program contexts from which I/O requests are issued and the continuous block space within files to which I/O requests are addressed. RACE is shown to significantly outperform LRU and all other state-of-the-art cache management schemes studied in this dissertation, in terms of hit ratios. At the metadata level, this dissertation exploits the temporal locality of metadata accesses to improve metadata access performance by designing a Hierarchical Bloom filter Array (HBA) scheme that decentralizes the metadata management. Our implementation indicates that HBA with 16 metadata servers can reduce the metadata operation time of a single-metadata-server architecture by a factor up to 43.9. A theoretical model that incorporates the staleness to estimate false rates of Bloom filters is proposed to support adaptive Bloom filter updating. At the file data level, this dissertation proposes to utilize redundant data to optimize the performance for large data accesses by dynamically scheduling I/O requests among data servers to improve I/O performance. At the application level, this work conducts a case study for a popular I/O intensive application, parallel BLAST, and uses this application as a benchmark to evaluate the techniques proposed at the file data level.

机译：近年来，高端计算发生了两个重大变化：（1）越来越关注数据密集型应用，例如数据挖掘，计算生物学和高能物理，以及（2）从紧密耦合的高范式转变高端专有计算系统到一个松散耦合的，具有成本效益的平台，该平台由联网的商用机器（也称为集群）组成。因此，集群中可靠且可扩展的存储基础架构对于高端计算变得越来越重要。本文从四个不同的层面研究了利用现有磁盘构建基于集群的存储系统的有效性，并从四个不同层面解决了限制此类基于集群的存储系统的可伸缩性的关键问题。数据级别和应用程序级别。在块数据级别，本文提出了一种新颖而简单的替换方案，称为RACE，该方案通过主动检测两个相关空间中固有呈现的访问模式来区分I / O流的局部性：程序上下文的离散块空间，发出/ O请求，并在文件中寻址I / O请求的连续块空间。在命中率方面，RACE被证明远远优于LRU和本文研究的所有其他最新的高速缓存管理方案。在元数据级别，本文通过设计分散元数据管理的分层布隆过滤器阵列（HBA）方案，利用元数据访问的时间局部性来改善元数据访问性能。我们的实现表明，具有16个元数据服务器的HBA可以将单元数据服务器体系结构的元数据操作时间减少多达43.9倍。提出了结合陈旧性来估计布隆过滤器错误率的理论模型，以支持自适应布隆过滤器更新。在文件数据级别，本文提出通过在数据服务器之间动态调度I / O请求以提高I / O性能，利用冗余数据来优化大型数据访问的性能。在应用程序级别，这项工作针对流行的I / O密集型应用程序并行BLAST进行了案例研究，并将该应用程序作为基准来评估在文件数据级别建议的技术。

著录项

作者
Zhu, Yifeng.;
展开▼
作者单位

The University of Nebraska - Lincoln.;

展开▼
授予单位 The University of Nebraska - Lincoln.;
学科 Computer Science.; Engineering Electronics and Electrical.
学位 Ph.D.
年度 2005
页码 212 p.
总页数 212
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;无线电电子学、电信技术;
关键词

相似文献

外文文献
中文文献
专利

1. Extensible block-level storage virtualization in cluster-based systems [J] . Michail D. Flouris, Renaud Lachaize, Konstantinos Chasapis, Journal of Parallel and Distributed Computing . 2010,第8期

机译：基于集群的系统中的可扩展块级存储虚拟化
2. HBA: Distributed Metadata Management for Large Cluster-Based Storage Systems [J] . Zhu Yifeng, Jiang Hong, Wang Jun, IEEE Transactions on Parallel and Distributed Systems . 2008,第6期

机译：HBA：大型基于群集的存储系统的分布式元数据管理
3. The impact of integrated cluster-based storage allocation on parts-to-picker warehouse performance [J] . Mirzaei Masoud, Zaerpour Nima, de Koster Rene Transportation Research . 2021,第Feba期

机译：基于集群的存储分配对零件到拾取器仓库性能的影响
4. Measurement and Analysis of TCP Throughput Collapse in Cluster-based Storage Systems [C] . Amar Phanishayee, Elie Krevat, Vijay Vasudevan, Proceedints of the 6th USENIX Conference on File and Storage Technologies(FAST'08) . 2008

机译：基于群集的存储系统中TCP吞吐量崩溃的测量和分析
5. The Collection and Storage Function Transition Point from Cluster-Based to Big Data Streaming Data [D] . Rubey, Sidney I. 2018

机译：从基于集群的数据到大数据流数据的收集和存储功能转换点
6. Software engineering risk factors in the implementation of a small electronic medical record system: the problem of scalability. [O] . Michael F. Chiang, Justin B. Starren 2002

机译：小型电子病历系统实施中的软件工程风险因素：可伸缩性问题。
7. HBA: Distributed Metadata Management for Large Cluster-Based Storage Systems [O] . Yifeng Zhu, Hong Jiang, Jun Wang, 2013

机译：HBA：大型基于群集的存储系统的分布式元数据管理
8. Co-Scheduling of Disk Head Time in Cluster-Based Storage [R] . Wachs, M., Ganger, G. R. 2009

机译：基于集群的存储中磁盘头时间的协同调度

Cluster-based storage systems with high scalability.

摘要

著录项

相似文献

相关主题

期刊订阅