首页> 外文会议>2017 20th International Conference of Computer and Information Technology >Cloud-POA: A cloud-based map only implementation of PO-MSA on Amazon multi-node EC2 Hadoop Cluster
【24h】

Cloud-POA: A cloud-based map only implementation of PO-MSA on Amazon multi-node EC2 Hadoop Cluster

机译:Cloud-POA:Amazon多节点EC2 Hadoop集群上基于云的PO-MSA地图仅实现

获取原文
获取原文并翻译 | 示例

摘要

Sequence alignment in bioinformatics and compu-tational biology has always been a challenging task. With Next Generation Sequencing (NGS) techniques in hand, researchers are now capable of studying biological systems at a level never been possible before. Scientists now have billions of bytes of biological data to work with, trillions of sequences to align. But this comes at a cost of requiring computing machines having a tremendous amount of computational and analytical power. Purchasing this huge amount of hardware and setting up a standalone infrastructure would not only cost an unnecessarily massive amount of money and labor but also would become troublesome to maintain. Moreover, for aligning a huge number of DNA or Protein sequences a scalable multiple sequence alignment (MSA) algorithms is needed with decent accuracy. In such context, this paper presents a novel implementation of Partial Order Alignment (POA) algorithm on a multi-node Hadoop Cluster running on MapReduce framework. The implementation was done in Amazon AWS platform with multiple EC2 instances. It is a map-only implementation with Hadoop Streaming. The result of this implementation shows a drastic reduction in runtime with no accuracy degradation.
机译:生物信息学和计算生物学中的序列比对一直是一项艰巨的任务。有了下一代测序(NGS)技术,研究人员现在能够以前所未有的水平研究生物系统。现在,科学家可以处理数十亿字节的生物数据,可以对齐数十亿个序列。但这是以需要具有大量计算和分析能力的计算机为代价的。购买如此大量的硬件并建立一个独立的基础架构,不仅会花费不必要的大量金钱和劳力,而且还会导致维护麻烦。而且,为了比对大量的DNA或蛋白质序列,需要具有可观的准确性的可扩展的多序列比对(MSA)算法。在这种情况下,本文提出了一种在MapReduce框架上运行的多节点Hadoop集群上的部分顺序对齐(POA)算法的新颖实现。该实现是在具有多个EC2实例的Amazon AWS平台中完成的。它是Hadoop Streaming的仅地图实现。该实现的结果表明运行时间大大减少,而精度没有下降。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号