首页> 美国卫生研究院文献>Bioinformatics >Hadoop-BAM: directly manipulating next generation sequencing data in the cloud
【2h】

Hadoop-BAM: directly manipulating next generation sequencing data in the cloud

机译:Hadoop-BAM:直接在云中操作下一代测序数据

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Summary: Hadoop-BAM is a novel library for the scalable manipulation of aligned next-generation sequencing data in the Hadoop distributed computing framework. It acts as an integration layer between analysis applications and BAM files that are processed using Hadoop. Hadoop-BAM solves the issues related to BAM data access by presenting a convenient API for implementing map and reduce functions that can directly operate on BAM records. It builds on top of the Picard SAM JDK, so tools that rely on the Picard API are expected to be easily convertible to support large-scale distributed processing. In this article we demonstrate the use of Hadoop-BAM by building a coverage summarizing tool for the Chipster genome browser. Our results show that Hadoop offers good scalability, and one should avoid moving data in and out of Hadoop between analysis steps.>Availability: Available under the open-source MIT license at >Contact: >Supplementary information: is available at Bioinformatics online.
机译:>摘要:Hadoop-BAM是一个新颖的库,用于在Hadoop分布式计算框架中可伸缩地操作对齐的下一代测序数据。它充当分析应用程序和使用Hadoop处理的BAM文件之间的集成层。 Hadoop-BAM通过提供一个方便的API来实现映射并减少可直接对BAM记录进行操作的功能,从而解决了与BAM数据访问有关的问题。它建立在Picard SAM JDK的基础上,因此,依赖于Picard API的工具有望易于转换以支持大规模分布式处理。在本文中,我们通过为Chipster基因组浏览器构建覆盖率汇总工具来演示Hadoop-BAM的使用。我们的结果表明,Hadoop提供了良好的可伸缩性,并且应该避免在分析步骤之间将数据移入和移出Hadoop。>可用性:在开放式MIT许可下,可通过> Contact:获得。 > >补充信息:可从Bioinformatics在线获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号