首页> 外文会议>Workshop on biomedical natural language processing 2015 >Extracting Disease-Symptom Relationships by Learning Syntactic Patterns from Dependency Graphs
【24h】

Extracting Disease-Symptom Relationships by Learning Syntactic Patterns from Dependency Graphs

机译:通过从依赖图学习句法模式来提取疾病-症状关系

获取原文
获取原文并翻译 | 示例

摘要

Disease-symptom relationships are of primary importance for biomedical informatics, but databases that catalog them are incomplete in comparison with the state of the art available in the scientific literature. We propose in this paper a novel method for automatically extracting disease-symptom relationships from text, called SPARE (standing for Syntactic PAttern for Relationship Extraction). This method is composed of 3 successive steps: first, we learn patterns from the dependency graphs; second, we select best patterns based on their respective quality and specificity (their ability to identify only disease-symptom relationships); finally, the patterns are used on new texts for extracting disease-symptom relationships. We experimented SPARE on a corpus of 121,796 abstracts of PubMed related to 457 rare diseases. The quality of the extraction has been evaluated depending on the pattern quality and specificity. The best F-measure obtained is 55.65% (for specificity ≥ 0.5 and quality ≥ 0.5). To provide an insight on the novelty of disease-symptom relationship extracted, we compare our results to the content of phenotype databases (OrphaData and OMIM). Our results show the feasibility of automatically extracting disease-symptom relationships, including true relationships that were not already referenced in phenotype databases and may involve complex symptom descriptions.
机译:疾病症状关系对生物医学信息学至关重要,但是与科学文献中现有的技术水平相比,对它们进行分类的数据库并不完整。我们在本文中提出了一种新的自动从文本中提取疾病-症状关系的方法,称为SPARE(代表关系提取的句法模式)。该方法包括3个连续的步骤:首先,我们从依赖图中学习模式;其次,我们根据它们各自的质量和特异性(它们仅能识别疾病-症状关系的能力)选择最佳模式;最后,将这些模式用于新文本以提取疾病-症状关系。我们在与457种罕见病有关的121,796篇PubMed摘要上对SPARE进行了实验。已经根据图案质量和特异性评估了提取质量。获得的最佳F量度为55.65%(对于特异性≥0.5和质量≥0.5)。为了提供对疾病-症状关系提取的新颖性的见解,我们将我们的结果与表型数据库(OrphaData和OMIM)的内容进行了比较。我们的结果表明自动提取疾病-症状关系的可行性,包括表型数据库中尚未引用的真实关系,可能涉及复杂的症状描述。

著录项

  • 来源
  • 会议地点 Beijing(CA)
  • 作者单位

    LORIA (CNRS, Inria, Universite de Lorraine), Campus scientifique, Vandoeuvre-les-Nancy, F-54506, France;

    LORIA (CNRS, Inria, Universite de Lorraine), Campus scientifique, Vandoeuvre-les-Nancy, F-54506, France;

    LORIA (CNRS, Inria, Universite de Lorraine), Campus scientifique, Vandoeuvre-les-Nancy, F-54506, France;

    LORIA (CNRS, Inria, Universite de Lorraine), Campus scientifique, Vandoeuvre-les-Nancy, F-54506, France;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-26 14:23:26

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号