首页> 外文会议>International Conference on Information Technology, Information Systems and Electrical Engineering >Processing next generation sequencing data in map-reduce framework using hadoop-BAM in a computer cluster
【24h】

Processing next generation sequencing data in map-reduce framework using hadoop-BAM in a computer cluster

机译:在计算机集群中使用hadoop-BAM在map-reduce框架中处理下一代测序数据

获取原文

摘要

Next-Generation Sequencing in bioinformatics produce a massive amount of data volume. Big data technologies are needed to reduce computation time in data processing. In this paper, we implement Hadoop Map-Reduce framework for processing Next-Generation Sequencing using Hadoop-BAM library. Our implementation process a Binary Alignment Map (BAM) file which contains a reference sequence and many alignedot-aligned reads by spitting the BAM file into Hadoop data blocks. To process the BAM file in a computer cluster, we implement a mapper and a reducer of Hadoop Map-Reduce framework. The mapper processes the BAM file to produce key value pairs. While, the reducer summary the key value pairs into a meaningful output. Here the mapper and reducer are created to summarize the number of bases in a BAM file. We conduct the experiment in a LIPI Hadoop cluster. The cluster consists of 96 CPU cores. The result of our experiments show that our map-reduce implementations are gaining speed-up compare to serial Next-Generation Sequencing with Picard tools.
机译:生物信息学中的下一代测序产生大量数据。需要大数据技术来减少数据处理中的计算时间。在本文中,我们实现了Hadoop Map-Reduce框架,用于使用Hadoop-BAM库处理下一代排序。我们的实现通过将BAM文件分散到Hadoop数据块中来处理二进制对齐图(BAM)文件,该文件包含参考序列和许多对齐/不对齐的读取。为了在计算机集群中处理BAM文件,我们实现了Hadoop Map-Reduce框架的映射器和简化器。映射器处理BAM文件以生成键值对。同时,reducer将键值对汇总为有意义的输出。在这里,创建了映射器和化简器以总结BAM文件中的碱基数。我们在LIPI Hadoop集群中进行实验。该集群包含96个CPU内核。我们的实验结果表明,与使用Picard工具进行的串行下一代测序相比,我们的map-reduce实现获得了更快的速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号