首页> 外文会议>Data Engineering, ICDE, 2009 IEEE 25th International Conference on >Effective XML Keyword Search with Relevance Oriented Ranking
【24h】

Effective XML Keyword Search with Relevance Oriented Ranking

机译:有效的XML关键字搜索,具有相关性排名

获取原文

摘要

Inspired by the great success of information retrieval (IR) style keyword search on the web, keyword search on XML has emerged recently. The difference between text database and XML database results in three new challenges: (1) Identify the user search intention, i.e. identify the XML node types that user wants to search for and search via. (2) Resolve keyword ambiguity problems: a keyword can appear as both a tag name and a text value of some node; a keyword can appear as the text values of different XML node types and carry different meanings. (3) As the search results are sub-trees of the XML document, new scoring function is needed to estimate its relevance to a given query. However, existing methods cannot resolve these challenges, thus return low result quality in term of query relevance. In this paper, we propose an IR-style approach which basically utilizes the statistics of underlying XML data to address these challenges. We first propose specific guidelines that a search engine should meet in both search intention identification and relevance oriented ranking for search results. Then based on these guidelines, we design novel formulae to identify the search for nodes and search via nodes of a query, and present a novel XML TF*IDF ranking strategy to rank the individual matches of all possible search intentions. Lastly, the proposed techniques are implemented in an XML keyword search engine called XReal, and extensive experiments show the effectiveness of our approach.
机译:受Web上信息检索(IR)样式关键字搜索的巨大成功的启发,最近出现了XML关键字搜索。文本数据库和XML数据库之间的差异带来了三个新的挑战:(1)识别用户搜索意图,即,确定用户要搜索和搜索的XML节点类型。 (2)解决关键词歧义问题:关键词既可以作为标签名称,又可以作为某个节点的文本值出现;关键字可以显示为不同XML节点类型的文本值,并具有不同的含义。 (3)由于搜索结果是XML文档的子树,因此需要新的评分功能来估计其与给定查询的相关性。但是,现有方法无法解决这些挑战,因此就查询相关性而言返回的结果质量较低。在本文中,我们提出了一种IR风格的方法,该方法主要利用基础XML数据的统计信息来解决这些挑战。我们首先提出搜索引擎在搜索意图识别和面向搜索结果的相关性排名中都应满足的特定准则。然后,基于这些准则,我们设计新颖的公式来标识对节点的搜索以及通过查询的节点进行搜索,并提出一种新颖的XML TF * IDF排名策略来对所有可能的搜索意图的单个匹配进行排名。最后,所提出的技术在名为XReal的XML关键字搜索引擎中实现,大量的实验证明了我们方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号