首页> 外文会议>Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining >An Automatic Unsupervised Querying Algorithm for Efficient Information Extraction in Biomedical Domain
【24h】

An Automatic Unsupervised Querying Algorithm for Efficient Information Extraction in Biomedical Domain

机译:生物医学域中有效信息提取的自动无监督查询算法

获取原文
获取外文期刊封面目录资料

摘要

In the domain of bioinformatics, extracting a relation such as protein-protein interations from a large database of text documents is a challenging task. One major issue with biomedical information extraction is how to efficiently digest the sheer size of unstructured biomedical data corpus. Often, among these huge biomedical data, only a small fraction of the documents contain information that is relevant to the extraction task. We propose a novel query expansion algorithm to automatically discover the characteristics of documents that are useful for extraction of a target relation. Our technique introduces a hybrid query re-weighting algorithm combining the modified Robertson Sparck-Jones query ranking algorithm with a keyphrase extraction algorithm. Our technique also adopts a novel query translation technique that incorporates POS categories to query translation. We conduct a series of experiments and report the experimental results. The results show that our technique is able to retrieve more documents that contain protein-protein pairs from MEDLINE as iteration increases. Our technique is also compared with SLIPPER, a supervised rule-based query expansion technique. The results show that our-technique outperforms SLIPPER from 17.90% to 29.98 better in four iterations.
机译:在生物信息学的领域中,从文本文件的大型数据库中提取诸如蛋白质 - 蛋白质的关系是一个具有挑战性的任务。生物医学信息提取的一个主要问题是如何有效地消化非结构化生物医学数据语料库的纯粹大小。通常,在这些巨大的生物医学数据中,只有一小部分文件包含与提取任务相关的信息。我们提出了一种新颖的查询扩展算法,可以自动发现有用的文档的特性,用于提取目标关系。我们的技术介绍了一种混合查询重新加权算法,将修改的Robertson Sparck-Jones查询排名算法与关键斑点提取算法相结合。我们的技术还采用了一种新的查询翻译技术,该技术包含POS类别来查询翻译。我们进行一系列实验并报告实验结果。结果表明,随着迭代增加,我们的技术能够检索含有来自Medline的蛋白质 - 蛋白质对的文献。我们的技术也与拖鞋进行比较,是一种受监督的基于规则的查询扩展技术。结果表明,在四个迭代中,我们 - 技术从17.90%到29.98的拖鞋比29.98更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号