首页> 外文OA文献 >Enabling Data-Guided Evaluation of Bioinformatics Workflow Quality
【2h】

Enabling Data-Guided Evaluation of Bioinformatics Workflow Quality

机译:启用数据指导的生物信息学工作流程质量评估

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Bioinformatics can be divided into two phases, the first phase is conversion of raw data into processed data and the second phase is using processed data to obtain scientific results. It is important to consider the first “workflow” phase carefully, as there are many paths on the way to a final processed dataset. Some workflow paths may be different enough to influence the second phase, thereby, leading to ambiguity in the scientific literature. Workflow evaluation in bioinformatics enables the investigator to carefully plan how to process their data. A system that uses real data to determine the quality of a workflow can be based on the inherent biological relationships in the data itself. To our knowledge, a general software framework that performs real data-driven evaluation of bioinformatics workflows does not exist.udThe Evaluation and Utility of workFLOW (EUFLOW) decision-theoretic framework, developed and tested on gene expression data, enables users of bioinformatics workflows to evaluate alternative workflow paths using inherent biological relationships. EUFLOW is implemented as an R package to enable users to evaluate workflow data. EUFLOW is a framework which also permits user-guided utility and loss functions, which enables the type of analysis to be considered in the workflow path decision. This framework was originally developed to address the quality of identifier mapping services between UNIPROT accessions and Affymetrix probesets to facilitate integrated analysis1. An extension to this framework evaluates Affymetrix probeset filtering methods on real data from endometrial cancer and TCGA ovarian serous carcinoma samples.2 Further evaluation of RNASeq workflow paths demonstrates generalizability of the EUFLOW framework. Three separate evaluations are performed including: 1) identifier filtering of features with biological attributes, 2) threshold selection parameter choice for low gene count features, and 3) commonly utilized RNASeq data workflow paths on The Cancer Genome Atlas data.udThe EUFLOW decision-theoretic framework developed and tested in my dissertation enables users of bioinformatics workflows to evaluate alternative workflow paths guided by inherent biological relationships and user utility.
机译:生物信息学可以分为两个阶段,第一阶段是将原始数据转换为处理后的数据,第二阶段是使用处理后的数据获得科学结果。仔细考虑第一个“工作流程”阶段非常重要,因为通往最终处理数据集的路径很多。一些工作流路径可能足够不同以影响第二阶段,从而导致科学文献中的歧义。生物信息学中的工作流评估使研究人员能够仔细计划如何处理其数据。使用真实数据确定工作流程质量的系统可以基于数据本身中固有的生物学关系。据我们所知,尚不存在执行对生物信息学工作流程进行真正的数据驱动评估的通用软件框架。 ud基于基因表达数据开发和测试的workFLOW(EUFLOW)决策理论框架的评估和实用程序使生物信息学工作流程的用户能够使用使用固有的生物学关系评估备选工作流程路径。 EUFLOW作为R包实现,使用户能够评估工作流数据。 EUFLOW是一个框架,该框架还允许用户指导实用程序和损失函数,这使得在工作流路径决策中可以考虑分析类型。该框架最初是为解决UNIPROT入藏号与Affymetrix探针集之间的标识符映射服务质量而开发的,以促进集成分析1。对该框架的扩展将对来自子宫内膜癌和TCGA卵巢浆液性癌样本的真实数据评估Affymetrix探针集过滤方法。2RNASeq工作流程路径的进一步评估证明EUFLOW框架具有普遍性。进行了三个单独的评估,包括:1)具有生物学属性的特征的标识符过滤,2)低基因计数特征的阈值选择参数选择,以及3)The Cancer Genome Atlas数据上常用的RNASeq数据工作流程路径。本文开发和测试的理论框架使生物信息学工作流的用户能够根据固有的生物学关系和用户效用来评估备选的工作流路径。

著录项

  • 作者

    McDade Kevin;

  • 作者单位
  • 年度 2017
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号