首页> 美国卫生研究院文献>Computational and Structural Biotechnology Journal >Scalability and Validation of Big Data Bioinformatics Software
【2h】

Scalability and Validation of Big Data Bioinformatics Software

机译:大数据生物信息学软件的可扩展性和验证

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This review examines two important aspects that are central to modern big data bioinformatics analysis – software scalability and validity. We argue that not only are the issues of scalability and validation common to all big data bioinformatics analyses, they can be tackled by conceptually related methodological approaches, namely divide-and-conquer (scalability) and multiple executions (validation). Scalability is defined as the ability for a program to scale based on workload. It has always been an important consideration when developing bioinformatics algorithms and programs. Nonetheless the surge of volume and variety of biological and biomedical data has posed new challenges. We discuss how modern cloud computing and big data programming frameworks such as MapReduce and Spark are being used to effectively implement divide-and-conquer in a distributed computing environment. Validation of software is another important issue in big data bioinformatics that is often ignored. Software validation is the process of determining whether the program under test fulfils the task for which it was designed. Determining the correctness of the computational output of big data bioinformatics software is especially difficult due to the large input space and complex algorithms involved. We discuss how state-of-the-art software testing techniques that are based on the idea of multiple executions, such as metamorphic testing, can be used to implement an effective bioinformatics quality assurance strategy. We hope this review will raise awareness of these critical issues in bioinformatics.
机译:本文回顾了现代大数据生物信息学分析的两个重要方面,即软件的可扩展性和有效性。我们认为,不仅可伸缩性和验证问题是所有大数据生物信息学分析所共有的,而且可以通过概念上相关的方法论方法来解决这些问题,即分而治之(可伸缩性)和多次执行(验证)。可伸缩性定义为程序根据工作负载进行伸缩的能力。在开发生物信息学算法和程序时,它一直是重要的考虑因素。但是,生物和生物医学数据的数量和种类的激增提出了新的挑战。我们将讨论如何使用现代云计算和大数据编程框架(如MapReduce和Spark)在分布式计算环境中有效地实现分而治之。软件验证是大数据生物信息学中另一个经常被忽略的重要问题。软件验证是确定被测程序是否满足其设计任务的过程。由于大的输入空间和复杂的算法,确定大数据生物信息学软件的计算输出的正确性特别困难。我们讨论了如何基于多重执行(例如变质测试)思想的最新软件测试技术可用于实施有效的生物信息学质量保证策略。我们希望这次审查能够提高人们对生物信息学中这些关键问题的认识。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号