首页> 美国卫生研究院文献>other >An Unsupervised Text Mining Method for Relation Extraction from Biomedical Literature
【2h】

An Unsupervised Text Mining Method for Relation Extraction from Biomedical Literature

机译:从生物医学文献中提取关系的无监督文本挖掘方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The wealth of interaction information provided in biomedical articles motivated the implementation of text mining approaches to automatically extract biomedical relations. This paper presents an unsupervised method based on pattern clustering and sentence parsing to deal with biomedical relation extraction. Pattern clustering algorithm is based on Polynomial Kernel method, which identifies interaction words from unlabeled data; these interaction words are then used in relation extraction between entity pairs. Dependency parsing and phrase structure parsing are combined for relation extraction. Based on the semi-supervised KNN algorithm, we extend the proposed unsupervised approach to a semi-supervised approach by combining pattern clustering, dependency parsing and phrase structure parsing rules. We evaluated the approaches on two different tasks: (1) Protein–protein interactions extraction, and (2) Gene–suicide association extraction. The evaluation of task (1) on the benchmark dataset (AImed corpus) showed that our proposed unsupervised approach outperformed three supervised methods. The three supervised methods are rule based, SVM based, and Kernel based separately. The proposed semi-supervised approach is superior to the existing semi-supervised methods. The evaluation on gene–suicide association extraction on a smaller dataset from Genetic Association Database and a larger dataset from publicly available PubMed showed that the proposed unsupervised and semi-supervised methods achieved much higher F-scores than co-occurrence based method.
机译:生物医学文章中提供的大量交互信息激发了文本挖掘方法的实现,以自动提取生物医学关系。本文提出了一种基于模式聚类和句子解析的无监督方法来处理生物医学关系提取。模式聚类算法基于多项式核方法,可从未标记的数据中识别交互词;这些交互词然后用于实体对之间的关​​系提取。依赖分析和短语结构分析结合在一起用于关系提取。在半监督KNN算法的基础上,我们将模式聚类,依赖项解析和短语结构解析规则相结合,将提出的无监督方法扩展为半监督方法。我们评估了两种不同任务的方法:(1)蛋白质-蛋白质相互作用提取,和(2)基因-自杀缔合提取。在基准数据集(AImed语料库)上对任务(1)的评估表明,我们提出的无监督方法优于三种有监督方法。三种受监督的方法分别是基于规则,基于SVM和基于内核的。所提出的半监督方法优于现有的半监督方法。对遗传关联数据库中较小的数据集和公开发布的PubMed中较大的数据集进行基因自杀关联提取的评估表明,与基于同现的方法相比,所提出的无监督和半监督方法可获得更高的F评分。

著录项

  • 期刊名称 other
  • 作者单位
  • 年(卷),期 -1(9),7
  • 年度 -1
  • 页码 e102039
  • 总页数 8
  • 原文格式 PDF
  • 正文语种
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号