首页> 外文OA文献 >A keyword-set search system for peer-to-peer networks
【2h】

A keyword-set search system for peer-to-peer networks

机译:用于对等网络的关键字集搜索系统

摘要

The Keyword-Set Search System (KSS) is a Peer-to-Peer (P2P) keyword search system that uses a distributed inverted index. The main challenge in a distributed index and search system is finding the right scheme to partition the index across the nodes in the network. The most obvious scheme would be to partition the index by keyword. A keyword partitioned index requires that the list of index entries for each keyword in a search be retrieved, so all the lists can be joined; only a few nodes need to be contacted, but each sends a potentially large amount of data. In KSS, the index is partitioned by sets of keywords. KSS builds an inverted index that maps each set of keywords to a list of all the documents that contain the words in the keyword-set. When a user issues a query, the keywords in the query are divided into sets of keywords. The document list for each set of keywords is then fetched from the network. The lists are intersected to compute the list of matching documents. The list of index entries for each set of words is smaller than the list of entries for each word. Thus search using KSS results in a smaller query time overhead. Preliminary experiments using traces of real user queries show that the keywordset approach is more efficient than a standard inverted index in terms of communication costs for query. Insert overhead for KSS grows exponentially as the size of the keyword-set used to generate the keys for index entries. The query overhead for the target application (metadata search in a music file sharing system) is reduced to the result of the query as no intermediate lists are transferred across the network for the join operation. Given our assumption that free disk space is plenty, and queries are more frequent than insertions in P2P systems, we believe this is a good tradeoff.
机译:关键字集搜索系统(KSS)是使用分布式倒排索引的对等(P2P)关键字搜索系统。分布式索引和搜索系统中的主要挑战是找到正确的方案来在网络中的节点之间划分索引。最明显的方案是按关键字对索引进行分区。关键字分区索引要求检索搜索中每个关键字的索引条目列表,以便所有列表都可以连接;仅需要联系几个节点,但每个节点都可能发送大量数据。在KSS中,索引按关键字集划分。 KSS构建一个反向索引,该索引将每组关键字映射到包含该关键字集中的单词的所有文档的列表。当用户发出查询时,查询中的关键字将分为几组关键字。然后从网络中获取每组关键字的文档列表。列表相交以计算匹配文档的列表。每个单词集的索引条目列表小于每个单词的条目列表。因此,使用KSS进行搜索可以减少查询时间。使用真实用户查询的痕迹进行的初步实验表明,就查询的通信成本而言,关键字集方法比标准倒排索引更有效。 KSS的插入开销随着用于生成索引条目的关键字的关键字集的大小呈指数增长。目标应用程序的查询开销(音乐文件共享系统中的元数据搜索)减少为查询的结果,因为没有中间列表通过网络进行联接操作。假设我们的可用磁盘空间足够大,并且查询比P2P系统中的插入更为频繁,我们认为这是一个不错的权衡。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号