【24h】

A Task-Based Approach for Large-Scale Evaluation of the Gene Ontology

机译:基于任务的基因本体大规模评估方法

获取原文
获取原文并翻译 | 示例

摘要

The Gene Ontology (GO) provides a framework to systematically classify and annotate gene function. The annotations associated with GO play a critical role in modern biology and cover many organisms. For the human genome, over 10,000 GO terms are used to annotate gene function in an expansive database of over 200,000 annotations. Due to the importance of the GO annotations in modern biology, significant effort has been put into assessing the quality of the annotations. Providing measures of annotation completeness, accuracy, and precision is critical if researchers are to use the annotations in real-world applications with confidence. Here, we describe a task-based approach that examines the completeness and utility of GO annotations through the lens of gene enrichment analysis. Our approach can be used to model the progression of the GO annotations over time, either for a particular area of interest or for the body of annotations as a whole. Using this framework, we conducted a large-scale analysis of gene expression datasets from the NCBI Gene Expression Omnibus (GEO). In particular, we identified terms of interest for each dataset through semantic annotation of biomedical data, then tracked the behavior of these terms as a function of time. The preliminary results provide significant information about the progress and character of GO annotations over time. This framework is flexible enough to examine all or part of the GO annotations, across multiple species, and with various enrichment methods. We also discuss how this framework can be used to evaluate different annotation methods. For example, by comparing the performance of annotations generated with a particular method to the performance of canonical annotations, it is possible to determine their relative quality.
机译:基因本体论(GO)提供了一个系统地分类和注释基因功能的框架。与GO相关的注释在现代生物学中起着至关重要的作用,涵盖了许多生物。对于人类基因组,在超过200,000个注释的扩展数据库中,使用了10,000多个GO术语来注释基因功能。由于GO批注在现代生物学中的重要性,因此已投入大量精力来评估批注的质量。如果研究人员要放心地在实际应用中使用注释,则提供注释完整性,准确性和准确性的度量至关重要。在这里,我们描述了一种基于任务的方法,该方法通过基因富集分析的角度检查了GO注释的完整性和实用性。我们的方法可用于对GO批注随时间推移的进展进行建模,无论是针对特定的关注领域,还是整个批注。使用此框架,我们对NCBI基因表达综合(GEO)的基因表达数据集进行了大规模分析。特别是,我们通过生物医学数据的语义注释为每个数据集确定了感兴趣的术语,然后跟踪这些术语作为时间的函数的行为。初步结果提供了有关GO批注随时间推移的进展和特征的重要信息。该框架足够灵活,可以使用多种富集方法检查跨多个物种的全部或部分GO注释。我们还将讨论如何将该框架用于评估不同的注释方法。例如,通过将用特定方法生成的注释的性能与规范注释的性能进行比较,可以确定其相对质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号