首页> 外文会议>ACM international conference on information and knowledge management >Extracting Cross References from Life Science Databases for Search Result Ranking
【24h】

Extracting Cross References from Life Science Databases for Search Result Ranking

机译:从生命科学数据库中提取交叉引用以进行搜索结果排名

获取原文

摘要

Scholars in life sciences have to process huge amounts of data in a disciplined and efficient way. These data are spread among thousands of databases which overlap in content but differ substantially with respect to interface, formats and data structure. Search engines have the potential of assisting in data retrieval from these structured sources but fall short of providing a relevance ranking of the results that reflects the needs of life science scholars. One such need is to acquire insights to cross-references among entities in the databases, whereby search hits with many cross-references are expected to be more informative than those with few cross-references. In this work, we investigate to what extend this expectation holds. We propose BioXRef, a method that extracts cross-references from multiple life science databases by combining targeted crawling, pointer chasing, sampling and information extraction. We study the retrieval quality of our method and the relationship between manually crafted relevance ranking and relevance ranking based on cross-references, and report on first, promising results.
机译:生命科学领域的学者必须以一种纪律有效的方式处理大量数据。这些数据分布在成千上万个数据库中,这些数据库的内容重叠,但在界面,格式和数据结构方面却大不相同。搜索引擎有潜力协助从这些结构化来源中检索数据,但未能提供反映生命科学学者需求的结果相关性排名。一种这样的需求是获取对数据库中实体之间的交叉引用的见解,从而与多个交叉引用相比,具有多个交叉引用的搜索命中被期望提供更多的信息。在这项工作中,我们将研究这种期望的适用范围。我们提出了BioXRef,一种通过结合目标爬网,指针追踪,采样和信息提取从多个生命科学数据库中提取交叉引用的方法。我们研究了我们方法的检索质量以及基于交叉引用的手工相关度等级和相关度等级之间的关系,并报告了首个有希望的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号