首页> 外文期刊>Concurrency, practice and experience >UniIndex: An index and query middleware for parallel file systems
【24h】

UniIndex: An index and query middleware for parallel file systems

机译:UniIndex:并行文件系统的索引和查询中间件

获取原文
获取原文并翻译 | 示例

摘要

As data analysis scenarios keep increasing on high-performance computing systems, the ability to select a small fraction of data from a large volume of scientific data sets is vital to accelerate scientific discovery. However, parallel file systems lack the ability to provide efficient data locating services at the granularity of both a file and a record. Existing methods for identifying and indexing data are often domain-specific and do not scale to large scientific data sets. In this paper, we describe the design and implementation of UniIndex framework, which combines our proposed techniques for user-annotation extraction, in-memory cache layer, in-situ indexing, and parallel query processing. Acting as middleware on top of production file systems, UniIndex enables efficient data locating services with minimal user effort. Our evaluations show that UniIndex can locate target files from directories containing millions of files in microseconds. By applying in situ indexing and the lightweight range-bitmap index, record-level index building time can be dramatically reduced while maintaining up to two orders of magnitude query speedup than scanning the entire data set.
机译:随着高性能计算系统上数据分析方案的不断增长,从大量科学数据集中选择一小部分数据的能力对于加快科学发现至关重要。但是,并行文件系统缺乏以文件和记录的粒度提供有效的数据定位服务的能力。现有的识别和索引数据的方法通常是特定于领域的,无法扩展到大型科学数据集。在本文中,我们描述了UniIndex框架的设计和实现,该框架结合了我们提出的用于用户注释提取,内存中缓存层,原位索引和并行查询处理的技术。作为生产文件系统上的中间件,UniIndex可以以最少的用户工作量实现高效的数据定位服务。我们的评估表明,UniIndex可以在微秒内从包含数百万个文件的目录中找到目标文件。通过应用原位索引和轻量级范围位图索引,可以显着减少记录级索引的建立时间,同时保持比扫描整个数据集高两个数量级的查询速度。

著录项

  • 来源
    《Concurrency, practice and experience》 |2020年第9期|e5609.1-e5609.17|共17页
  • 作者

  • 作者单位

    Natl Univ Def Technol Coll Comp Changsha Hunan Peoples R China|State Key Lab High Performance Comp Changsha Hunan Peoples R China;

    Sun Yat Sen Univ Natl Supercomp Ctr Guangzhou Sch Data & Comp Sci Guangzhou Guangdong Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    big data; data management; high-performance computing; index;

    机译:大数据;数据管理;高性能计算;指数;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号