首页> 外文学位 >Scalable file systems and operating systems support for big data applications.
【24h】

Scalable file systems and operating systems support for big data applications.

机译:可扩展的文件系统和操作系统支持大数据应用程序。

获取原文
获取原文并翻译 | 示例

摘要

The decades-old concepts and assumptions behind traditional file system design have been rendered partially invalid. They are now sources of performance bottlenecks by the arrival of Big Data computing at the application level, and emerging substrate technologies such as Manycore CPUs and Storage Class Memory (SCM) at the architecture level.;This dissertation starts by conducting a thorough literature review to provide a big picture of existing data-management solutions, as well as the state-of-art OS scalability research in the context of manycore CPUs and SCM devices. This dissertation then presents three related but orthogonal solutions that tackle the file system scalability issues from different layers.;The first solution is a distributed file-search service called Propeller. The main challenge in this research is to provide a high-performance and accurate file-search service without significantly impacting intensive IOs in large file systems. Thus, in this work, we explore and exploit the application-aware access patterns, captured by the Access-Causality Graphs (ACGs), to assist Propeller in optimizing the file-index partitioning strategy. By applying the ACGs captured from applications, Propeller is able to retain access locality within a small index, which further improves the index performance.;In the second solution, we re-evaluate the design of modern file systems, and re-think and re-conceptualize the file system namespace. We propose a new form of file system: searchable file system, in which, the identity of files and directories are based on user-queries. A prototype of this new form of file system, called Versatile Searchable File System (VSFS), is implemented to demonstrate the feasibility and benefits of such a new file system.;The third solution proposed in this dissertation is to evaluate the scalability of the latest Linux kernel storage stack on top of Manycore and SCM. Our evaluations credibly demonstrate that the current Linux storage stack scales poorly on high-core-count NUMA systems. It strongly suggests the Linux kernel developers to revise the shared-memory model for the design of the Linux storage stack. A distributed Virtual File System is proposed to eliminate the cache coherence overhead for directory entry and to improve the I/O parallelism by reducing the contention on locks.
机译:传统文件系统设计背后数十年的概念和假设已部分无效。随着大数据计算在应用程序级别的出现以及新兴的基板技术(例如在架构级别上的Manycore CPU和存储类内存(SCM))的出现,它们成为了性能瓶颈的来源。概述了现有的数据管理解决方案,以及在许多核心CPU和SCM设备中进行的最新OS可扩展性研究。然后,本文提出了三个相关的但相互正交的解决方案,分别解决了来自不同层的文件系统可伸缩性问题。第一个解决方案是名为Propeller的分布式文件搜索服务。这项研究的主要挑战是提供高性能和准确的文件搜索服务,而不会显着影响大型文件系统中的密集IO。因此,在这项工作中,我们探索和利用由访问因果图(ACG)捕获的应用程序感知访问模式,以帮助Propeller优化文件索引分区策略。通过应用从应用程序捕获的ACG,Propeller能够将访问局部性保留在较小的索引内,从而进一步提高了索引性能。在第二种解决方案中,我们重新评估了现代文件系统的设计,并重新思考和重新设计。 -概念化文件系统名称空间。我们提出了一种新形式的文件系统:可搜索文件系统,其中,文件和目录的标识基于用户查询。实现了这种新型文件系统的原型,称为“通用可搜索文件系统(VSFS)”,以证明这种新文件系统的可行性和益处。本文提出的第三个解决方案是评估最新文件系统的可伸缩性。在Manycore和SCM之上的Linux内核存储堆栈。我们的评估可靠地表明,当前的Linux存储堆栈在高核数NUMA系统上的扩展性很差。它强烈建议Linux内核开发人员针对Linux存储堆栈的设计修改共享内存模型。提出了一种分布式虚拟文件系统,以消除目录输入的高速缓存一致性开销,并通过减少锁争用来提高I / O并行性。

著录项

  • 作者

    Xu, Lei.;

  • 作者单位

    The University of Nebraska - Lincoln.;

  • 授予单位 The University of Nebraska - Lincoln.;
  • 学科 Computer engineering.
  • 学位 Ph.D.
  • 年度 2014
  • 页码 138 p.
  • 总页数 138
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号