Processing next generation sequencing data in map-reduce framework using hadoop-BAM in a computer cluster

机译：在计算机集群中使用hadoop-BAM在map-reduce框架中处理下一代测序数据

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Next-Generation Sequencing in bioinformatics produce a massive amount of data volume. Big data technologies are needed to reduce computation time in data processing. In this paper, we implement Hadoop Map-Reduce framework for processing Next-Generation Sequencing using Hadoop-BAM library. Our implementation process a Binary Alignment Map (BAM) file which contains a reference sequence and many alignedot-aligned reads by spitting the BAM file into Hadoop data blocks. To process the BAM file in a computer cluster, we implement a mapper and a reducer of Hadoop Map-Reduce framework. The mapper processes the BAM file to produce key value pairs. While, the reducer summary the key value pairs into a meaningful output. Here the mapper and reducer are created to summarize the number of bases in a BAM file. We conduct the experiment in a LIPI Hadoop cluster. The cluster consists of 96 CPU cores. The result of our experiments show that our map-reduce implementations are gaining speed-up compare to serial Next-Generation Sequencing with Picard tools.

机译：生物信息学中的下一代测序产生大量数据。需要大数据技术来减少数据处理中的计算时间。在本文中，我们实现了Hadoop Map-Reduce框架，用于使用Hadoop-BAM库处理下一代排序。我们的实现通过将BAM文件分散到Hadoop数据块中来处理二进制对齐图（BAM）文件，该文件包含参考序列和许多对齐/不对齐的读取。为了在计算机集群中处理BAM文件，我们实现了Hadoop Map-Reduce框架的映射器和简化器。映射器处理BAM文件以生成键值对。同时，reducer将键值对汇总为有意义的输出。在这里，创建了映射器和化简器以总结BAM文件中的碱基数。我们在LIPI Hadoop集群中进行实验。该集群包含96个CPU内核。我们的实验结果表明，与使用Picard工具进行的串行下一代测序相比，我们的map-reduce实现获得了更快的速度。

著录项

来源
《International Conference on Information Technology, Information Systems and Electrical Engineering》|2017年|421-425|共5页
会议地点
作者
Rifki Sadikin; Andria Arisal; Rofithah Omar; Nur Hidayah Mazni;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Sequential analysis; Tools; Next generation networking; Bioinformatics; Genomics; Random access memory; Big Data;

机译：顺序分析;工具;下一代网络;生物信息学;基因组学;随机存取存储器;大数据;

相似文献

外文文献
中文文献
专利

1. Hadoop-BAM: directly manipulating next generation sequencing data in the cloud [J] . Niemenmaa Matti, Kallio Aleksi, Schumacher Andre, Bioinformatics . 2012,第6期

机译：Hadoop-BAM：直接在云中操作下一代测序数据
2. Hadoop-BAM: directly manipulating next generation sequencing data in the cloud [J] . Keijo Heljanko Bioinformatics . 2012,第6期

机译：Hadoop-BAM：直接在云中操作下一代测序数据
3. Moth-Flame Optimization-Bat Optimization: Map-Reduce Framework for Big Data Clustering Using the Moth-Flame Bat Optimization and Sparse Fuzzy C-Means [J] . Ravuri Vasavi, Vasundra S. Big Data . 2020,第3期

机译：蛾火焰优化 - 蝙蝠优化：使用蛾火焰蝙蝠优化和稀疏模糊C-means的大数据聚类映射框架
4. Processing next generation sequencing data in map-reduce framework using hadoop-BAM in a computer cluster [C] . Rifki Sadikin, Andria Arisal, Rofithah Omar, International Conferences on Information Technology, Information Systems and Electrical Engineering . 2017

机译：在计算机群集中使用Hadoop-BAM处理地图 - 减少框架中的下一代测序数据
5. Clustering algorithms for next-generation sequencing data from heterogenous populations. [D] . Prabhakara, Shruthi. 2012

机译：来自异类种群的下一代测序数据的聚类算法。
6. Hadoop-BAM: directly manipulating next generation sequencing data in the cloud [O] . Matti Niemenmaa, Aleksi Kallio, André Schumacher, -1

机译：Hadoop-BAM：直接在云中操作下一代测序数据
7. Hadoop-BAM: directly manipulating next generation sequencing data in the cloud [O] . Niemenmaa, Matti, Kallio, Aleksi, Schumacher, André, 2012

机译：Hadoop-BAM：直接在云中操作下一代测序数据

Processing next generation sequencing data in map-reduce framework using hadoop-BAM in a computer cluster

摘要

著录项

相似文献

相关主题

期刊订阅