首页> 外文期刊>International journal on digital libraries >Examining topic shifts in content-oriented XML retrieval
【24h】

Examining topic shifts in content-oriented XML retrieval

机译:检查面向内容的XML检索中的主题转移

获取原文
获取原文并翻译 | 示例
       

摘要

Content-oriented XML retrieval systems support access to XML repositories by retrieving, in response to user queries, XML document components (XML elements) instead of whole documents. The retrieved XML elements should not only contain information relevant to the query, but also provide the right level of granularity. In INEX, the INitiative for the Evaluation of XML retrieval, a relevant element is defined to be at the right level of granularity if it is exhaustive and specific to the query. Specificity was specifically introduced to capture how focused an element is on the query (i.e., discusses no other irrelevant topics). To score XML elements according to how exhaustive and specific they are given a query, the content and logical structure of XML documents have been widely used. One source of evidence that has led to promising results with respect to retrieval effectiveness is element length. This work aims at examining a new source of evidence deriving from the semantic decomposition of XML documents. We consider that XML documents can be semantically decomposed through the application of a topic segmentation algorithm. Using the semantic decomposition and the logical structure of XML documents, we propose a new source of evidence, the number of topic shifts in an element, to reflect its relevance and more particularly its specificity. This paper has three research objectives. Firstly, we investigate the characteristics of XML elements reflected by their number of topic shifts. Secondly, we compare topic shifts to element length, by incorporating each of them as arnfeature in a retrieval setting and examining their effects in estimating the relevance of XML elements given a query. Finally, we use the number of topic shifts as evidence for capturing specificity to provide a focused access to XML repositories.
机译:面向内容的XML检索系统通过响应用户查询来检索XML文档组件(XML元素)而不是整个文档,从而支持对XML存储库的访问。检索到的XML元素不仅应包含与查询有关的信息,而且还应提供正确的粒度级别。在INEX(用于XML检索评估的启动程序)中,如果相关元素详尽无遗且特定于查询,则相关元素被定义为处于正确的粒度级别。专门引入了特异性,以捕获元素在查询中的关注程度(即,不讨论其他无关主题)。为了根据查询的详尽程度和具体程度对XML元素进行评分,XML文档的内容和逻辑结构已得到广泛使用。元素长度是导致检索效果令人鼓舞的结果的一个证据来源。这项工作旨在研究一种新的证据来源,该证据源于XML文档的语义分解。我们认为可以通过主题分割算法的应用来对XML文档进行语义分解。通过使用XML文档的语义分解和逻辑结构,我们提出了一个新的证据来源,即元素中主题转移的数量,以反映其相关性,尤其是其特殊性。本文有三个研究目标。首先,我们研究XML元素的特征,这些特征反映了它们的主题转移次数。其次,我们将主题转移与元素长度进行比较,方法是将每个主题转移作为arnfeature合并到检索设置中,并检查它们在估计给定查询的XML元素的相关性方面的效果。最后,我们使用主题转移的次数作为捕获特异性的证据,以提供对XML存储库的集中访问。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号