【24h】

Ontology-Based Search of Genomic Metadata

机译:基于本体的基因组元数据搜索

获取原文
获取原文并翻译 | 示例
       

摘要

The Encyclopedia of DNA Elements (ENCODE) is a huge and still expanding public repository of more than 4,000 experiments and 25,000 data files, assembled by a large international consortium since 2007; unknown biological knowledge can be extracted from these huge and largely unexplored data, leading to data-driven genomic, transcriptomic, and epigenomic discoveries. Yet, search of relevant datasets for knowledge discovery is limitedly supported: metadata describing ENCODE datasets are quite simple and incomplete, and not described by a coherent underlying ontology. Here, we show how to overcome this limitation, by adopting an ENCODE metadata searching approach which uses high-quality ontological knowledge and state-of-the-art indexing technologies. Specifically, we developed (), a system supporting effective semantic search and retrieval of ENCODE datasets. First, we constructed a Semantic Knowledge Base by starting with concepts extracted from ENCODE metadata, matched to and expanded on biomedical ontologies integrated in the well-established Unified Medical Language System. We prove that this inference method is sound and complete. Then, we leveraged the Semantic Knowledge Base to semantically search ENCODE data from arbitrary biologists’ queries. This allows correctly finding more datasets than those extracted by a purely syntactic search, as supported by the other available systems. We empirically show the relevance of found datasets to the biologists’ queries.
机译:DNA元素百科全书(ENCODE)是一个庞大且仍在扩展中的公共资料库,其中包含4,000多个实验和25,000数据文件,自2007年以来由一个大型国际财团组装;可以从这些庞大且很大程度上未开发的数据中提取未知的生物学知识,从而导致数据驱动的基因组,转录组和表观基因组发现。但是,为知识发现而搜索相关数据集受到的支持有限:描述ENCODE数据集的元数据非常简单且不完整,并且没有相关的基础本体进行描述。在这里,我们展示了如何通过采用ENCODE元数据搜索方法来克服此限制,该方法使用了高质量的本体论知识和最新的索引技术。具体来说,我们开发了(),该系统支持有效的语义搜索和检索ENCODE数据集。首先,我们从从ENCODE元数据中提取的概念入手,构建了一个语义知识库,将其与在完善的统一医学语言系统中集成的生物医学本体进行匹配和扩展。我们证明了这种推理方法是正确和完整的。然后,我们利用语义知识库从任意生物学家的查询中语义搜索ENCODE数据。与其他可用系统支持的情况相比,这可以正确查找比纯粹通过句法搜索提取的数据集更多的数据集。我们以经验的方式显示找到的数据集与生物学家的查询的相关性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号