首页> 外文期刊>Data & Knowledge Engineering >Adaptive Indexing For Content-based Search In P2p Systems
【24h】

Adaptive Indexing For Content-based Search In P2p Systems

机译:P2p系统中基于内容的搜索的自适应索引

获取原文
获取原文并翻译 | 示例

摘要

One of the major challenges in Peer-to-Peer (P2P) file sharing systems is to support content-based search. Although there have been some proposals to address this challenge, they share the same weakness of using either servers or super-peers to keep global knowledge, which is required to identify importance of terms to avoid popular terms in query processing. As a result, they are not scalable and are prone to the bottleneck problem, which is caused by the high visiting load at the global knowledge maintainers. To that end, in this paper, we propose a novel adaptive indexing approach for content-based search in P2P systems, which can identify importance of terms without keeping global knowledge. Our method is based on an adaptive indexing structure that combines a Chord ring and a balanced tree. The tree is used to aggregate and classify terms adaptively, while the Chord ring is used to index terms of nodes in the tree. Specifically, at each node of the tree, the system classifies terms as either important or unimportant. Important terms, which can distinguish the node from its neighbor nodes, are indexed in the Chord ring. On the other hand, unimportant terms, which are either popular or rare terms, are aggregated to higher level nodes. Such classification enables the system to process queries on the fly without the need for global knowledge. Besides, compared to the methods that index terms separately, term aggregation reduces the indexing cost significantly. Taking advantage of the tree structure, we also develop an efficient search algorithm to tackle the bottleneck problem near the root. Finally, our extensive experiments on both benchmark and Wikipedia data-sets validated the effectiveness and efficiency of the proposed method.
机译:对等(P2P)文件共享系统的主要挑战之一是支持基于内容的搜索。尽管已经提出了一些解决此挑战的建议,但它们仍存在使用服务器或超级对等方保留全球知识的相同缺点,这是确定术语的重要性所必需的,以避免在查询处理中使用流行术语。结果,它们不具有可伸缩性,并且容易出现瓶颈问题,这是由全球知识维护者的高访问量引起的。为此,在本文中,我们为P2P系统中基于内容的搜索提出了一种新颖的自适应索引方法,该方法可以在不保留全局知识的情况下识别术语的重要性。我们的方法基于结合了Chord环和平衡树的自适应索引结构。该树用于自适应地聚合和分类术语,而Chord环用于索引树中节点的术语。具体而言,在树的每个节点上,系统将术语分类为重要或不重要。在Chord环中索引了可以区分节点与其邻居节点的重要术语。另一方面,不重要的术语(无论是常用术语还是稀有术语)都汇总到较高级别的节点中。这种分类使系统无需全局知识即可动态处理查询。此外,与单独索引项的方法相比,项聚合显着降低了索引成本。利用树结构,我们还开发了一种有效的搜索算法来解决根附近的瓶颈问题。最后,我们在基准和Wikipedia数据集上的大量实验验证了该方法的有效性和效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号