首页> 美国卫生研究院文献>Bioinformatics >Application and evaluation of automated semantic annotation of gene expression experiments
【2h】

Application and evaluation of automated semantic annotation of gene expression experiments

机译:基因表达实验自动语义标注的应用与评价

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Motivation: Many microarray datasets are available online with formalized standards describing the probe sequences and expression values. Unfortunately, the description, conditions and parameters of the experiments are less commonly formalized and often occur as natural language text. This hinders searching, high-throughput analysis, organization and integration of the datasets.>Results: We use the lexical resources and software tools from the Unified Medical Language System (UMLS) to extract concepts from text. We then link the UMLS concepts to classes in open biomedical ontologies. The result is accessible and clear semantic annotations of gene expression experiments. We applied the method to 595 expression experiments from Gemma, a resource for re-use and meta-analysis of gene expression profiling data. We evaluated and corrected all stages of the annotation process. The majority of missed annotations were due to a lack of cross-references. The most error-prone stage was the extraction of concepts from phrases. Final review of the annotations in context of the experiments revealed 89% precision. A naive system, lacking the phrase to concept corrections is 68% precise. We have integrated this annotation pipeline into Gemma.>Availability: The source code, documentation and are available at . The results of the manual evaluations are provided as . Both manual and predicted annotations can be viewed and searched via the Gemma website at . The complete set of predicted annotations is available as a machine readable resource description framework graph.>Contact:
机译:>动机:许多微阵列数据集都可以在线获得,其中包含描述探针序列和表达值的正式标准。不幸的是,实验的描述,条件和参数很少正式化,并且经常以自然语言文字出现。这阻碍了数据集的搜索,高通量分析,组织和集成。>结果:我们使用统一医学语言系统(UMLS)的词汇资源和软件工具从文本中提取概念。然后,我们将UMLS概念链接到开放式生物医学本体中的类。结果是可访问的且基因表达实验的清晰语义注释。我们将该方法应用于来自Gemma的595个表达实验中,该资源可重复使用和对基因表达谱数据进行元分析。我们评估并纠正了注释过程的所有阶段。大部分遗漏的注释是由于缺少交叉引用。最容易出错的阶段是从短语中提取概念。在实验中对注释的最终审查显示出89%的精度。缺少概念更正的幼稚系统精确度达68%。我们已将此注释管道集成到Gemma中。>可用性:源代码,文档,网址为。手动评估的结果提供为。可以通过Gemma网站(网址为)查看和搜索手动注释和预测注释。完整的预测注释集可作为机器可读资源描述框架图提供。>联系方式:

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号