首页> 美国卫生研究院文献>GigaScience >Bioinformatics applications on Apache Spark
【2h】

Bioinformatics applications on Apache Spark

机译:Apache Spark上的生物信息学应用程序

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

With the rapid development of next-generation sequencing technology, ever-increasing quantities of genomic data pose a tremendous challenge to data processing. Therefore, there is an urgent need for highly scalable and powerful computational systems. Among the state-of–the-art parallel computing platforms, Apache Spark is a fast, general-purpose, in-memory, iterative computing framework for large-scale data processing that ensures high fault tolerance and high scalability by introducing the resilient distributed dataset abstraction. In terms of performance, Spark can be up to 100 times faster in terms of memory access and 10 times faster in terms of disk access than Hadoop. Moreover, it provides advanced application programming interfaces in Java, Scala, Python, and R. It also supports some advanced components, including Spark SQL for structured data processing, MLlib for machine learning, GraphX for computing graphs, and Spark Streaming for stream computing. We surveyed Spark-based applications used in next-generation sequencing and other biological domains, such as epigenetics, phylogeny, and drug discovery. The results of this survey are used to provide a comprehensive guideline allowing bioinformatics researchers to apply Spark in their own fields.
机译:随着下一代测序技术的飞速发展,基因组数据量的不断增加对数据处理提出了巨大的挑战。因此,迫切需要高度可扩展且功能强大的计算系统。在最先进的并行计算平台中,Apache Spark是用于大型数据处理的快速,通用的内存中迭代计算框架,通过引入弹性分布式数据集可确保高容错性和高可伸缩性抽象。在性能方面,Spark的内存访问速度可比Hadoop快100倍,磁盘访问速度可快10倍。此外,它提供Java,Scala,Python和R中的高级应用程序编程接口。它还支持一些高级组件,包括用于结构化数据处理的Spark SQL,用于机器学习的MLlib,用于计算图形的GraphX和用于流计算的Spark Streaming。我们调查了下一代测序和其他生物学领域(例如表观遗传学,系统发育和药物发现)中使用的基于Spark的应用程序。这项调查的结果用于提供全面的指导方针,使生物信息学研究人员可以将Spark应用于自己的领域。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号