首页> 外文期刊>Journal of Information Science >Top-K data source selection for keyword queries over multiple XML data sources
【24h】

Top-K data source selection for keyword queries over multiple XML data sources

机译:通过多个XML数据源进行关键字查询的Top-K数据源选择

获取原文
获取原文并翻译 | 示例
       

摘要

With the proliferation of XML data, searching XML data using keyword queries has attracted much attention. However, most of the current approaches focus on keyword-based searches over a single XML document Searching over a system integrating hundreds or even thousands of data sources by sequentially querying every single source is extremely costly, and thus may be impractical. In this article we propose a novel approach for selecting the top-K data sources by relying on their relevance to a given query, to avoid the high cost of searching in numerous, potentially irrelevant data sources. Our approach summarizes the data sources as succinct synopses for the rapid filtering of non-promising sources. We maintain both structural and value distribution information of each data source, and propose a novel ranking function to measure effectively the relevance of the data source to the given query. We con ducted experiments with real datasets, and results show that our approach achieves high performances in all evaluation metrics: recall, precision and Spearman's rank correlation coefficient with different experimental parameters.
机译:随着XML数据的激增,使用关键字查询搜索XML数据已引起广泛关注。但是,当前的大多数方法都集中在单个XML文档上的基于关键字的搜索上,通过依次查询每个单个源来搜索集成了数百甚至数千个数据源的系统是非常昂贵的,因此可能是不切实际的。在本文中,我们提出了一种新颖的方法,通过依赖于前K个数据源与给定查询的相关性来选择它们,从而避免了在众多可能不相关的数据源中进行搜索的高昂成本。我们的方法将数据源概括为简要概述,用于快速过滤没有希望的源。我们维护每个数据源的结构和价值分布信息,并提出一种新颖的排名功能,以有效地测量数据源与给定查询的相关性。我们用真实的数据集进行了实验,结果表明我们的方法在所有评估指标上均具有较高的性能:召回率,精度和具有不同实验参数的Spearman秩相关系数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号