首页> 外文期刊>Journal of Parallel and Distributed Computing >A distributed selectivity-driven search strategy for semi-structured data over DHT-based networks
【24h】

A distributed selectivity-driven search strategy for semi-structured data over DHT-based networks

机译:基于DHT的网络上半结构化数据的分布式选择性驱动搜索策略

获取原文
获取原文并翻译 | 示例

摘要

Distributed Hash Tables (DHTs) are widely used for indexing and locating many types of resources, including semi-structured data modeled as XML documents. A common distributed strategy to process an XML query over a DHT consists in splitting it into a set of simple path queries, and resolving each of them separately. The traffic generated by this strategy grows with the number of paths in the query. To overcome this drawback, an alternative strategy consists in resolving only the sub-query associated with the most selective path, and then submitting the original query to the nodes in the result set. A first goal of this paper is to provide an analytical and experimental study of the two strategies to assess their relative merits in different scenarios. On the basis of this study, we introduce an Adaptive Path Selection (APS) search technique that resolves an XML query in a distributed way by querying either the most selective path or the whole path set, based on the selectivity of the paths in the query. The effective use of APS requires that the querying nodes know in advance the selectivity of all the paths. Addressing this problem is another goal of the paper, which is achieved through: (ⅰ) The definition of a space-efficient data structure, the Path Selectivity Table (PST), which given any path, returns an estimate of its selectivity. (ⅱ) The definition of an efficient strategy that builds the PST in a distributed way and propagates it to all nodes in the network with logarithmic performance bounds and without redundant messages. Experimental results show that the PST accurately estimates the path selectivity values, and that the traffic generated by the APS algorithm using PST-estimated selectivity values is comparable to that produced by APS assuming to know the real path selectivity values.
机译:分布式哈希表(DHT)被广泛用于索引和查找许多类型的资源,包括建模为XML文档的半结构化数据。在DHT上处理XML查询的一种常见的分布式策略包括将其分为一组简单的路径查询,然后分别解决每个查询。此策略生成的流量随着查询中路径的数量而增长。为了克服此缺点,一种替代策略包括仅解决与最有选择性的路径相关联的子查询,然后将原始查询提交到结果集中的节点。本文的首要目标是对这两种策略进行分析和实验研究,以评估它们在不同情况下的相对优势。在此研究的基础上,我们引入了一种自适应路径选择(APS)搜索技术,该技术根据查询中路径的选择性,通过查询最有选择性的路径或整个路径集来以分布式方式解析XML查询。 。有效使用APS要求查询节点事先知道所有路径的选择性。解决此问题是本文的另一个目标,可以通过以下方法实现:(ⅰ)定义空间有效的数据结构,即路径选择表(PST),该路径指定了任何路径,并返回其选择性的估计值。 (ⅱ)一种有效策略的定义,该策略以分布式方式构建PST,并将其传播到具有对数性能范围且没有冗余消息的网络中的所有节点。实验结果表明,PST可以准确估计路径选择性值,并且假设知道真实的路径选择性值,APS算法使用PST估计的选择性值生成的流量与APS产生的流量相当。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号