【24h】

An Extensible Ontology Modeling Approach Using Post Coordinated Expressions for Semantic Provenance in Biomedical Research

机译:生物医学研究中使用后坐标表达的语义来源可扩展本体建模方法

获取原文

摘要

Provenance metadata describing the source or origin of data is critical to verify and validate results of scientific experiments. Indeed, reproducibility of scientific studies is rapidly gaining significant attention in the research community, for example biomedical and healthcare research. To address this challenge in the biomedical research domain, we have developed the Provenance for Clinical and Healthcare Research (ProvCaRe) using World Wide Web Consortium (W3C) PROV specifications, including the PROV Ontology (PROV-O). In the ProvCaRe project, we are extending PROV-0 to create a formal model of provenance information that is necessary for scientific reproducibility and replication in biomedical research. However, there are several challenges associated with the development of the ProvCaRe ontology, including: (1) Ontology engineering: modeling all biomedical provenance-related terms in an ontology has undefined scope and is not feasible before the release of the ontology; (2) Redundancy: there are a large number of existing biomedical ontologies that already model relevant biomedical terms; and (3) Ontology maintenance: adding or deleting terms from a large ontology is error prone and it will be difficult to maintain the ontology over time. Therefore, in contrast to modeling all classes and properties in an ontology before deployment (also called precoordination), we propose the "ProvCaRe Compositional Grammar Syntax" to model ontology classes on-demand (also called postcoordination). The compositional grammar syntax allows us to re-use existing biomedical ontology classes and compose provenance-specific terms that extend PROV-O classes and properties. We demonstrate the application of this approach in the ProvCaRe ontology and the use of the ontology in the development of the ProvCaRe knowledgebase that consists of more than 38 million provenance triples automatically extracted from 384,802 published research articles using a text processing workflow.
机译:描述数据来源或来源的来源元数据对于验证和验证科学实验的结果至关重要。确实,科学研究的可重复性正在迅速引起研究界的广泛关注,例如生物医学和医疗保健研究。为了应对生物医学研究领域中的这一挑战,我们已经使用万维网联盟(W3C)PROV规范(包括PROV本体论(PROV-O))开发了临床和保健研究起源(ProvCaRe)。在ProvCaRe项目中,我们正在扩展PROV-0以创建正式的出处信息模型,这对于生物医学研究中的科学再现性和复制是必不可少的。然而,与ProvCaRe本体的开发相关的挑战包括:(1)本体工程:在本体中对所有与生物医学起源相关的术语进行建模的范围是不确定的,并且在本体发布之前是不可行的; (2)冗余:现有的大量生物医学本体已经对相关的生物医学术语进行了建模; (3)本体维护:从大型本体中添加或删除术语容易出错,并且随着时间的推移将难以维护本体。因此,与在部署之前对本体中的所有类和属性进行建模(也称为预协调)相反,我们提出了“ ProvCaRe组成语法语法”来按需对本体类进行建模(也称为后协调)。组成语法语法使我们可以重用现有的生物医学本体论类别,并编写可扩展PROV-O类别和属性的特定于起源的术语。我们演示了该方法在ProvCaRe本体中的应用,以及本体在ProvCaRe知识库开发中的使用,该知识库包括使用文本处理工作流从384,802篇已发表的研究文章中自动提取的3,800万个起源三元组。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号