首页> 美国卫生研究院文献>Evolutionary Bioinformatics Online >Bioinformatics Workflows With NoSQL Database in Cloud Computing
【2h】

Bioinformatics Workflows With NoSQL Database in Cloud Computing

机译:NoSQL数据库在云计算中的生物信息学工作流程

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Scientific workflows can be understood as arrangements of managed activities executed by different processing entities. It is a regular Bioinformatics approach applying workflows to solve problems in Molecular Biology, notably those related to sequence analyses. Due to the nature of the raw data and the in silico environment of Molecular Biology experiments, apart from the research subject, 2 practical and closely related problems have been studied: reproducibility and computational environment. When aiming to enhance the reproducibility of Bioinformatics experiments, various aspects should be considered. The reproducibility requirements comprise the data provenance, which enables the acquisition of knowledge about the trajectory of data over a defined workflow, the settings of the programs, and the entire computational environment. Cloud computing is a booming alternative that can provide this computational environment, hiding technical details, and delivering a more affordable, accessible, and configurable on-demand environment for researchers. Considering this specific scenario, we proposed a solution to improve the reproducibility of Bioinformatics workflows in a cloud computing environment using both Infrastructure as a Service (IaaS) and Not only SQL (NoSQL) database systems. To meet the goal, we have built 3 typical Bioinformatics workflows and ran them on 1 private and 2 public clouds, using different types of NoSQL database systems to persist the provenance data according to the Provenance Data Model (PROV-DM). We present here the results and a guide for the deployment of a cloud environment for Bioinformatics exploring the characteristics of various NoSQL database systems to persist provenance data.
机译:科学工作流可以理解为由不同处理实体执行的托管活动的安排。这是一种常规的生物信息学方法,将工作流应用于解决分子生物学中的问题,尤其是与序列分析有关的问题。由于原始数据的性质和分子生物学实验的计算机环境,除了研究主题之外,还研究了两个实际且密切相关的问题:可再现性和计算环境。当旨在提高生物信息学实验的可重复性时,应考虑各个方面。可重复性要求包括数据来源,该来源可以在定义的工作流程,程序设置以及整个计算环境中获取有关数据轨迹的知识。云计算是一种蓬勃发展的替代方案,可以提供这种计算环境,隐藏技术细节并为研究人员提供更实惠,更易访问且可配置的按需环境。考虑到此特定方案,我们提出了一种解决方案,该解决方案使用基础结构即服务(IaaS)和SQL(NoSQL)数据库系统,在云计算环境中提高生物信息学工作流的可重复性。为了实现该目标,我们建立了3种典型的生物信息学工作流程,并在1种私有云和2种公共云上运行它们,并使用不同类型的NoSQL数据库系统根据出处数据模型(PROV-DM)来保存出处数据。我们在这里介绍了结果,并为生物信息学云环境的部署提供了指南,以探索各种NoSQL数据库系统的特性来持久保存来源数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号