首页> 外文会议>Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining(PAKDD 2005); 20050518-20; Hanoi(VN) >An Automatic Unsupervised Querying Algorithm for Efficient Information Extraction in Biomedical Domain
【24h】

An Automatic Unsupervised Querying Algorithm for Efficient Information Extraction in Biomedical Domain

机译:生物医学领域高效信息自动提取的无监督查询算法

获取原文
获取原文并翻译 | 示例

摘要

In the domain of bioinformatics, extracting a relation such as protein-protein interations from a large database of text documents is a challenging task. One major issue with biomedical information extraction is how to efficiently digest the sheer size of unstructured biomedical data corpus. Often, among these huge biomedical data, only a small fraction of the documents contain information that is relevant to the extraction task. We propose a novel query expansion algorithm to automatically discover the characteristics of documents that are useful for extraction of a target relation. Our technique introduces a hybrid query re-weighting algorithm combining the modified Robertson Sparck-Jones query ranking algorithm with a keyphrase extraction algorithm. Our technique also adopts a novel query translation technique that incorporates POS categories to query translation. We conduct a series of experiments and report the experimental results. The results show that our technique is able to retrieve more documents that contain protein-protein pairs from MEDLINE as iteration increases. Our technique is also compared with SLIPPER, a supervised rule-based query expansion technique. The results show that our-technique outperforms SLIPPER from 17.90% to 29.98 better in four iterations.
机译:在生物信息学领域,从大型文本文档数据库中提取诸如蛋白质-蛋白质相互作用之类的关系是一项艰巨的任务。生物医学信息提取的一个主要问题是如何有效地消化非结构化生物医学数据语料库的庞大规模。通常,在这些巨大的生物医学数据中,只有一小部分文档包含与提取任务相关的信息。我们提出了一种新颖的查询扩展算法,以自动发现可用于提取目标关系的文档特征。我们的技术引入了一种混合查询重新加权算法,该算法结合了改进的Robertson Sparck-Jones查询排名算法和关键短语提取算法。我们的技术还采用了一种新颖的查询翻译技术,该技术将POS类别合并到查询翻译中。我们进行了一系列实验并报告了实验结果。结果表明,随着迭代次数的增加,我们的技术能够从MEDLINE中检索更多包含蛋白质对的文档。我们的技术还与基于规则的有监督的查询扩展技术SLIPPER进行了比较。结果表明,在四次迭代中,我们的技术的性能从SLIPPER上提高了17.90%,达到29.98。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号