首页> 外文期刊>Bioinformatics >Application and evaluation of automated semantic annotation of gene expression experiments
【24h】

Application and evaluation of automated semantic annotation of gene expression experiments

机译:基因表达实验自动语义标注的应用与评价

获取原文
获取原文并翻译 | 示例
       

摘要

MOTIVATION: Many microarray datasets are available online with formalized standards describing the probe sequences and expression values. Unfortunately, the description, conditions and parameters of the experiments are less commonly formalized and often occur as natural language text. This hinders searching, high-throughput analysis, organization and integration of the datasets. RESULTS: We use the lexical resources and software tools from the Unified Medical Language System (UMLS) to extract concepts from text. We then link the UMLS concepts to classes in open biomedical ontologies. The result is accessible and clear semantic annotations of gene expression experiments. We applied the method to 595 expression experiments from Gemma, a resource for re-use and meta-analysis of gene expression profiling data. We evaluated and corrected all stages of the annotation process. The majority of missed annotations were due to a lack of cross-references. The most error-prone stage was the extraction of concepts from phrases. Final review of the annotations in context of the experiments revealed 89% precision. A naive system, lacking the phrase to concept corrections is 68% precise. We have integrated this annotation pipeline into Gemma. AVAILABILITY: The source code, documentation and Supplementary Materials are available at http://www.chibi.ubc.ca/GEOMMTX. The results of the manual evaluations are provided as Supplementary Material. Both manual and predicted annotations can be viewed and searched via the Gemma website at http://www.chibi.ubc.ca/Gemma. The complete set of predicted annotations is available as a machine readable resource description framework graph.
机译:动机:许多微阵列数据集可在线获得,并带有描述探针序列和表达值的正式标准。不幸的是,实验的描述,条件和参数很少正式化,并且经常以自然语言文字出现。这阻碍了数据集的搜索,高通量分析,组织和集成。结果:我们使用统一医学语言系统(UMLS)的词汇资源和软件工具从文本中提取概念。然后,我们将UMLS概念链接到开放式生物医学本体中的类。结果是可访问的,并且是基因表达实验的清晰语义注释。我们将该方法应用于来自Gemma的595个表达实验中,该资源可用于基因表达谱数据的重复使用和元分析。我们评估并纠正了注释过程的所有阶段。大多数缺少的注释是由于缺少交叉引用。最容易出错的阶段是从短语中提取概念。在实验中对注释的最终审查显示出89%的精度。缺少概念更正的幼稚系统精确度达68%。我们已将此注释管道集成到Gemma中。可用性:源代码,文档和补充材料可从http://www.chibi.ubc.ca/GEOMMTX获得。人工评估的结果作为补充材料提供。可以通过Gemma网站(http://www.chibi.ubc.ca/Gemma)来查看和搜索手动注释和预测注释。完整的预测注释集可作为机器可读资源描述框架图使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号