首页> 外文会议>IEEE International Conference on Bioinformatics and Bioengineering >Streaming Distributed DNA Sequence Alignment Using Apache Spark
【24h】

Streaming Distributed DNA Sequence Alignment Using Apache Spark

机译:使用Apache Spark流式传输分布式DNA序列比对

获取原文

摘要

The large amount of data generated by NextGeneration Sequencing (NGS) technology, usually in the order of hundreds of gigabytes per experiment, has to be analyzed quickly to generate meaningful variant results. The first step in analyzing such data is to map those sequenced reads to their corresponding positions in the human genome. One of the most popular tools to do such sequence alignment is the Burrows-Wheeler Aligner (BWA mem). One limitation of the BWA program though is that it cannot be run on a cluster. In this paper, we propose StreamBWA, a new framework that allows the BWA mem program to run on a cluster in a distributed fashion, at the same time while the input data is being streamed into the cluster. It can process the input data directly from a compressed file, which either lies on the local file system or on a URL. Moreover, StreamBWA can start combining the output files of the distributed BWA mem tasks at the same time while these tasks are still being executed on the cluster. Empirical evaluation shows that this streaming distributed approach is approximately 2x faster than the nonstreaming approach. Furthermore, our streaming distributed approach is 5x faster than other state-of-the-art solutions such as SparkBWA. The source code of StreamBWA is publicly available at https://github.com/HamidMushtaq/StreamBWA.
机译:NextGeneration Sequencing(NGS)技术生成的大量数据(通常每个实验大约数百GB)必须快速分析以产生有意义的变体结果。分析此类数据的第一步是将这些测序的读段映射到人类基因组中的相应位置。进行此类序列比对的最流行工具之一是Burrows-Wheeler Aligner(BWA mem)。但是,BWA程序的一个局限性在于它不能在群集上运行。在本文中,我们提出了StreamBWA,这是一个新框架,它允许BWA mem程序以分布式方式在群集上运行,同时将输入数据流式传输到群集中。它可以直接从位于本地文件系统或URL上的压缩文件处理输入数据。而且,StreamBWA可以同时开始组合分布式BWA mem任务的输出文件,而这些任务仍在集群上执行。实证评估表明,这种流式分布式方法比非流式方法快大约2倍。此外,我们的流式分布式方法比其他最新解决方案(如SparkBWA)快5倍。 StreamBWA的源代码可从https://github.com/HamidMushtaq/StreamBWA公开获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号