...
首页> 外文期刊>Parallel Computing >SDAFT: A novel scalable data access framework for parallel BLAST
【24h】

SDAFT: A novel scalable data access framework for parallel BLAST

机译:SDAFT:一种用于并行BLAST的新颖的可伸缩数据访问框架

获取原文
获取原文并翻译 | 示例
           

摘要

In order to run tasks in a parallel and load-balanced fashion, existing scientific parallel applications such as mpiBLAST introduce a data-initializing stage to move database fragments from shared storage to local cluster nodes. Unfortunately, with the exponentially increasing size of sequence databases in today's big data era, such an approach is inefficient. In this paper, we develop a scalable data access framework to solve the data movement problem for scientific applications that are dominated by "read" operation for data analysis. SDAFT employs a distributed file system (DFS) to provide scalable data access for parallel sequence searches. SDAFT consists of two interlocked components: (1) a data centric load-balanced scheduler (DC-scheduler) to enforce data-process locality and (2) a translation layer to translate conventional parallel I/O operations into HDFS I/O. By experimenting our SDAFT prototype system with real-world database and queries at a wide variety of computing platforms, we found that SDAFT can reduce I/O cost by a factor of 4-10 and double the overall execution performance as compared with existing schemes.
机译:为了以并行和负载平衡的方式运行任务,现有的科学并行应用程序(例如mpiBLAST)引入了数据初始化阶段,以将数据库片段从共享存储移动到本地群集节点。不幸的是,在当今的大数据时代,随着序列数据库规模的呈指数增长,这种方法效率低下。在本文中,我们开发了可扩展的数据访问框架,以解决科学应用程序中的数据移动问题,这些问题主要由“读取”操作进行数据分析。 SDAFT使用分布式文件系统(DFS)为并行序列搜索提供可伸缩的数据访问。 SDAFT由两个互锁的组件组成:(1)以数据为中心的负载平衡调度程序(DC-scheduler)强制执行数据处理局部性;以及(2)转换层将常规的并行I / O操作转换为HDFS I / O。通过在各种计算平台上使用真实数据库和查询对我们的SDAFT原型系统进行实验,我们发现SDAFT可以将I / O成本降低4-10倍,并且与现有方案相比,可以将整体执行性能提高一倍。

著录项

  • 来源
    《Parallel Computing》 |2014年第10期|697-709|共13页
  • 作者单位

    EECS, University of Central Florida, Orlando, United States;

    EECS, University of Central Florida, Orlando, United States;

    EECS, University of Central Florida, Orlando, United States;

    Department of Computer Science, Virginia Tech, Blacksburg, VA 2406, United States;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    MPI/POSIX I/O; HDFS; Parallel sequence search; mpiBLAST;

    机译:MPI / POSIX I / O;HDFS;并行序列搜索;mpBLAST;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号