首页> 中文期刊> 《计算机学报》 >基于关键词的深度万维网数据库选择

基于关键词的深度万维网数据库选择

         

摘要

该文提出一种基于关键词的深度万维网查询方法:用户用关键词的方式提交查询,该方法在线地选择能够反映查询意图并且提供高质量结果的万维网数据库.这种方法既避免了深度万维网数据抓取这一代价高、难度大的操作,又可支持多领域的数据库上的关键词查询,从而能够与现有的搜索引擎实现无缝集成.文中侧重于讨论基于关键词的数据库选择,从以下两个方面解决这一问题所涉及的挑战:(1)提出了一种度量关键词—领域属性关联的相关性模型,并设计了基于随机游动的算法从查询日志中发现潜在的关键词—属性关联;(2)给出了一种新的数据采样方法,并用于基于采样的数据库—查询的相关性模型中,最终解决深度万维网的数据库选择问题.在中文深度万维网真实数据集上的实验表明:提出的方法能够有效地选择与关键词查询相关的数据库,提供高质量的结果.%This paper proposes a keyword-based Deep Web search method: Given keyword queries provided by users, the proposed method on-the-fly selects the databases capturing the query intent and providing high-quality data. The method, which is much more efficient than Deep Web crawling, can support keyword search over multiple-domain Deep Web databases, and thus can be smoothly integrated with the existing search engine architecture. In this paper, we focus on keyword-based Deep Web database selection, and study the research challenges that naturally arise in the proposed method. (1) We introduce an effective model to measure the relevance of database-domain attributes with respect to keyword queries, and propose a random-walk algorithm to compute the relevance from database query logs. (2) We develop a novel database sampling method for measuring the relevance of databases with respect to queries, in order to select relevant databases in the selected domains. We have implemented our methods on real data sets from the Chinese Deep Web. The experimental results show that our methods achieve high effectiveness.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号