首页> 外文会议>Extended Semantic Web Conference >A Collection of Benchmark Data Sets for Knowledge Graph-Based Similarity in the Biomedical Domain
【24h】

A Collection of Benchmark Data Sets for Knowledge Graph-Based Similarity in the Biomedical Domain

机译:用于生物医学域中知识图形的相似性的基准数据集的集合

获取原文

摘要

The ability to compare entities within a knowledge graph is a cornerstone technique for several applications, ranging from the integration of heterogeneous data to machine learning. It is of particular importance in biomedical applications such as prediction of protein-protein interactions, associations between diseases and genes, cellular localization of proteins, among others. However, building a gold standard data set to support their evaluation is non-trivial, due to size, diversity and complexity of biomedical knowledge graphs. We present a collection of 21 benchmark data sets that aim at circumventing the difficulties in building benchmarks for large biomedical knowledge graphs by exploiting proxies for biomedical entity similarity. These data sets include data from two successful biomedical ontologies, the Gene Ontology and the Human Phenotype Ontology, and explore proxy similarities based on protein and gene properties. Data sets have varying sizes and cover four different species at different levels of annotation completion. For each data set we also provide semantic similarity computations with state of the art representative measures.
机译:能够在知识图中比较实体是几个应用的基石技术,从异构数据集成到机器学习。在生物医学应用中特别重要,例如蛋白质 - 蛋白质相互作用,疾病与基因之间的关联,蛋白质的细胞定位等。然而,由于尺寸,多样性和复杂性的生物医学知识图形,构建金标准数据集以支持其评估是非微不足道的。我们提出了21个基准数据集的集合,其目的通过利用生物医学实体相似性的代理来避免建立大型生物医学知识图表的基准。这些数据集包括来自两种成功的生物医学本体,基因本体和人类表型本体的数据,以及基于蛋白质和基因特性的探索代理相似性。数据集具有不同的尺寸并在不同级别的注释完成时覆盖四种不同的物种。对于每个数据集,我们还提供了具有艺术代表措施的状态的语义相似性计算。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号