Scalability and Validation of Big Data Bioinformatics Software

机译：大数据生物信息学软件的可扩展性和验证

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

This review examines two important aspects that are central to modern big data bioinformatics analysis – software scalability and validity. We argue that not only are the issues of scalability and validation common to all big data bioinformatics analyses, they can be tackled by conceptually related methodological approaches, namely divide-and-conquer (scalability) and multiple executions (validation). Scalability is defined as the ability for a program to scale based on workload. It has always been an important consideration when developing bioinformatics algorithms and programs. Nonetheless the surge of volume and variety of biological and biomedical data has posed new challenges. We discuss how modern cloud computing and big data programming frameworks such as MapReduce and Spark are being used to effectively implement divide-and-conquer in a distributed computing environment. Validation of software is another important issue in big data bioinformatics that is often ignored. Software validation is the process of determining whether the program under test fulfils the task for which it was designed. Determining the correctness of the computational output of big data bioinformatics software is especially difficult due to the large input space and complex algorithms involved. We discuss how state-of-the-art software testing techniques that are based on the idea of multiple executions, such as metamorphic testing, can be used to implement an effective bioinformatics quality assurance strategy. We hope this review will raise awareness of these critical issues in bioinformatics.

机译：本文回顾了现代大数据生物信息学分析的两个重要方面，即软件的可扩展性和有效性。我们认为，不仅可伸缩性和验证问题是所有大数据生物信息学分析所共有的，而且可以通过概念上相关的方法论方法来解决这些问题，即分而治之（可伸缩性）和多次执行（验证）。可伸缩性定义为程序根据工作负载进行伸缩的能力。在开发生物信息学算法和程序时，它一直是重要的考虑因素。但是，生物和生物医学数据的数量和种类的激增提出了新的挑战。我们将讨论如何使用现代云计算和大数据编程框架（如MapReduce和Spark）在分布式计算环境中有效地实现分而治之。软件验证是大数据生物信息学中另一个经常被忽略的重要问题。软件验证是确定被测程序是否满足其设计任务的过程。由于大的输入空间和复杂的算法，确定大数据生物信息学软件的计算输出的正确性特别困难。我们讨论了如何基于多重执行（例如变质测试）思想的最新软件测试技术可用于实施有效的生物信息学质量保证策略。我们希望这次审查能够提高人们对生物信息学中这些关键问题的认识。

著录项

期刊名称 Computational and Structural Biotechnology Journal
作者
Andrian Yang; Michael Troup; Joshua W.K. Ho;
展开▼
作者单位

展开▼
年(卷),期 2017(15),-1
年度 2017
页码 379–386
总页数 8
原文格式 PDF
正文语种
中图分类生物学;
关键词

相似文献

外文文献
中文文献
专利

1. Scalability and Validation of Big Data Bioinformatics Software [J] . Andrian Yang, Michael Troup, Joshua W.K. Ho Computational and Structural Biotechnology Journal . 2017,第1期

机译：大数据生物信息学软件的可扩展性和验证
2. The Online Bioinformatics Resources Collection at the University of Pittsburgh Health Sciences Library System—a one-stop gateway to online bioinformatics databases and software tools [J] . Ansuman Chattopadhyay, Cynthia Gadd, Nancy Tannery, Nucleic acids research . 2007,第suppla1期

机译：匹兹堡大学健康科学图书馆系统的在线生物信息学资源库—在线生物信息学数据库和软件工具的一站式门户
3. MetaBasis: A Web-Based Database Containing Metadata on Software Tools and Databases in the Field of Bioinformatics [J] . Atlamazoglou Vassilis, Thireou Trias, Hamodrakas Yannis, Applied bioinformatics . 2006,第3期

机译：MetaBasis：基于Web的数据库，其中包含生物信息学领域中软件工具和数据库的元数据
4. VALIDATION OF A BUILDING THERMAL MODEL IN CLIM2000 SIMULATIONSOFTWARE USING FULL-SCALE EXPERIMENTAL DATA, SENSITIVITYANALYSIS AND UNCERTAINTY ANALYSIS [C] . Gilles GUYON, Nadia RAHNI Proceedings of building simulation'97: proceedings of fifth IBPSA conference and exhibition . 1997

机译：利用大规模实验数据，灵敏度分析和不确定性分析对CLIM2000仿真软件中的建筑热模型进行验证
5. Scalable and robust clustering and visualization for large-scale bioinformatics data. [D] . Ruan, Yang. 2014

机译：用于大规模生物信息学数据的可扩展且强大的聚类和可视化。
6. The Gaggle: An open-source software system for integrating bioinformatics software and data sources [O] . Paul T Shannon, David J Reiss, Richard Bonneau, 2006

机译：The Gaggle：一个用于集成生物信息学软件和数据源的开源软件系统
7. Scalability and Validation of Big Data Bioinformatics Software [O] . Andrian Yang, Michael Troup, Joshua W.K. Ho 2017

机译：大数据生物信息学软件的可扩展性和验证

Scalability and Validation of Big Data Bioinformatics Software

摘要

著录项

相似文献

相关主题

期刊订阅