An Ontology-Enabled Natural Language Processing Pipeline for Provenance Metadata Extraction from Biomedical Text (Short Paper)

机译：一种从本体论出发的自然语言处理管道，用于从生物医学文本中提取来源元数据（短论文）

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Extraction of structured information from biomedical literature is a complex and challenging problem due to the complexity of biomedical domain and lack of appropriate natural language processing (NLP) techniques. High quality domain ontologies model both data and metadata information at a fine level of granularity, which can be effectively used to accurately extract structured information from biomedical text. Extraction of provenance metadata, which describes the history or source of information, from published articles is an important task to support scientific reproducibility. Reproducibility of results reported by previous research studies is a foundational component of scientific advancement. This is highlighted by the recent initiative by the US National Institutes of Health called "Principles of Rigor and Reproducibility". In this paper, we describe an effective approach to extract provenance metadata from published biomedical research literature using an ontology-enabled NLP platform as part of the Provenance for Clinical and Healthcare Research (Prov-CaRe). The ProvCaRe-NLP tool extends the clinical Text Analysis and Knowledge Extraction System (cTAKES) platform using both provenance and biomedical domain ontologies. We demonstrate the effectiveness of ProvCaRe-NLP tool using a corpus of 20 peer-reviewed publications. The results of our evaluation demonstrate that the ProvCaRe-NLP tool has significantly higher recall in extracting provenance metadata as compared to existing NLP pipelines such as MetaMap.

机译：由于生物医学领域的复杂性和缺乏适当的自然语言处理（NLP）技术，从生物医学文献中提取结构化信息是一个复杂而具有挑战性的问题。高质量的领域本体以精细的粒度对数据和元数据信息进行建模，可以有效地用于从生物医学文本中准确提取结构化信息。从已发表的文章中提取描述信息的历史或来源的出处元数据是支持科学可重复性的一项重要任务。先前研究报告所报告结果的可重复性是科学进步的基础。美国国立卫生研究院最近提出的“严谨性和可重复性原理”倡议突显了这一点。在本文中，我们描述了一种有效的方法，该方法使用支持本体的NLP平台从已发表的生物医学研究文献中提取来源元数据，并将其作为临床和保健研究来源（Prov-CaRe）的一部分。 ProvCaRe-NLP工具使用来源和生物医学领域本体，扩展了临床文本分析和知识提取系统（cTAKES）平台。我们使用20个经过同行评审的出版物来证明ProvCaRe-NLP工具的有效性。我们的评估结果表明，与现有的NLP管道（例如MetaMap）相比，ProvCaRe-NLP工具在提取出处元数据时具有更高的召回率。

著录项

来源
《International conference on the move to meaningful internet systems;Conference on cooperative information systems;Conference on cloud and trusted computing;Conference on ontologies, databases, and applications of semantics》|2016年|699-708|共10页
会议地点
作者
Joshua Valdez; Michael Rueschman; Matthew Kim; Susan Redline; Satya S. Sahoo;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Ontology-based natural language processing; Provenance metadata; Scientific reproducibility; Named entity recognition;

机译：基于本体的自然语言处理;来源元数据;科学再现性;命名实体识别;

相似文献

外文文献
专利

1. Semantic similarity of short texts in languages with a deficient natural language processing support [J] . Bojan Furlan, Vuk Batanovic, Bosko Nikolic Decision support systems . 2013,第3期

机译：缺乏自然语言处理支持的语言中的短文本的语义相似性
2. Using rule-based natural language processing to improve disease normalization in biomedical text [J] . KangN., SinghB., AfzalZ., Journal of the American Medical Informatics Association : . 2013,第5期

机译：使用基于规则的自然语言处理来改善生物医学文本中的疾病正常化
3. A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools [J] . Karin Verspoor, Kevin B Cohen, Arrick Lanfranchi, BMC Bioinformatics . 2012,第1期

机译：全文期刊文章集是一种强大的评估工具，可揭示生物医学自然语言处理工具的性能差异
4. An Ontology-Enabled Natural Language Processing Pipeline for Provenance Metadata Extraction from Biomedical Text (Short Paper) [C] . Joshua Valdez, Michael Rueschman, Matthew Kim, OnTheMove Confederated International Conferences . 2016

机译：启用本体的自然语言处理管道，用于从生物医学文本提取出处元数据（短文）
5. Semantic metadata extraction from open domain texts in natural language [D] . Cordoba Rodas, Angie Paola. 2013

机译：从自然语言中的开放域文本提取语义元数据
6. An Ontology-Enabled Natural Language Processing Pipeline forProvenance Metadata Extraction from Biomedical Text (ShortPaper) [O] . Joshua Valdez, Michael Rueschman, Matthew Kim, -1

机译：用于本体的自然语言处理管道从生物医学文本中提取来源元数据（简短内容）纸）
7. A Year of Papers Using Biomedical Texts: Findings from the Section on Natural Language Processing of the IMIA Yearbook [O] . Natalia Grabar, Cyril Grouin 2019

机译：使用生物医学文本的一年文件：从IMIa年鉴的自然语言处理部分的结果

An Ontology-Enabled Natural Language Processing Pipeline for Provenance Metadata Extraction from Biomedical Text (Short Paper)

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅