首页> 外文OA文献 >Query routing in cooperative semi-structured peer-to-peer information retrieval networks
【2h】

Query routing in cooperative semi-structured peer-to-peer information retrieval networks

机译:协作半结构对等信息检索网络中的查询路由

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Conventional web search engines are centralised in that a single entity crawls and indexes the documents selected for future retrieval, and the relevance models used to determine which documents are relevant to a given user query. As a result, these search engines suffer from several technical drawbacks such as handling scale, timeliness and reliability, in addition to ethical concerns such as commercial manipulation and information censorship. Alleviating the need to rely entirely on a single entity, Peer-to-Peer (P2P) Information Retrieval (IR) has been proposed as a solution, as it distributes the functional components of a web search engine – from crawling and indexing documents, to query processing – across the networkudof users (or, peers) who use the search engine. This strategy for constructingudan IR system poses several efficiency and effectiveness challenges which haveudbeen identified in past work. Accordingly, this thesis makes several contributions towards advancing the state of the art in P2P-IR effectiveness by improving the query processing and relevance scoring aspects of a P2P web search.ud Federated search systems are a form of distributed information retrieval model that route the user’s information need, formulated as a query, to distributed resources and merge the retrieved result lists into a final list. P2P-IR networks are one form of federated search in routing queries and merging result among participating peers. The query is propagated through disseminated nodes to hit the peers that are most likely to contain relevant documents, then the retrieved result lists are merged at different points along the path from the relevant peers to the query initializer (or namely, customer). However, query routing in P2P-IR networks is considered as one of the major challenges and critical part in P2P-IR networks; as the relevant peers might be lost in low-quality peer selection while executing the query routing, and inevitably lead to less effective retrieval results. This motivates this thesis to study and propose query routing techniques to improve retrieval quality in such networks.ud Cluster-based semi-structured P2P-IR networks exploit the cluster hypothesis to organise the peers into similar semantic clusters where each such semantic cluster is managed by super-peers. In this thesis, I construct three semi-structured P2P-IR models and examine their retrieval effectiveness. I also leverage the cluster centroids at the super-peer level as content representations gathered from cooperative peers to propose a query routing approach called Inverted PeerCluster Index (IPI) that simulates the conventional inverted index of the centralised corpus to organise the statistics of peers’ terms. The results show a competitive retrieval quality in comparison to baseline approaches. Furthermore, I study the applicability of using the conventional Information Retrieval models as peer selection approaches where each peer can be considered as a big document of documents. The experimental evaluation shows comparative and significant results and explains that document retrieval methods are very effective for peerudselection that brings back the analogy between documents and peers. Additionally, Learning to Rank (LtR) algorithms are exploited to build a learned classifier for peer ranking at the super-peer level. The experiments show significant results with state-of-the-art resource selection methods and competitive results to corresponding classification-based approaches.ud Finally, I propose reputation-based query routing approaches that exploit the idea of providing feedback on a specific item in the social community networks and manage it for future decision-making. The system monitors users’ behaviours when they click or download documents from the final ranked list as implicit feedback and mines the given information to build a reputation-based data structure. The data structure is used to score peers and then rank them for query routing. I conduct a set of experiments to cover various scenarios including noisy feedback information (i.e, providing positive feedback on non-relevant documents) to examine the robustness of reputation-based approaches. The empirical evaluation shows significant results in almost all measurement metrics with approximate improvement more than 56% compared to baseline approaches. Thus, based on the results, if one were to choose one technique, reputation-based approaches are clearly the natural choices which also can be deployed on any P2P network.
机译:常规的Web搜索引擎集中在一个实体中,该实体可对选定的文档进行爬网和索引以备将来检索,而相关性模型则用于确定哪些文档与给定的用户查询相关。结果,除了诸如商业操纵和信息审查之类的道德问题之外,这些搜索引擎还具有诸如处理规模,及时性和可靠性之类的若干技术缺陷。对等(P2P)信息检索(IR)提出了一种解决方案,该解决方案完全缓解了单个实体的需求,因为它可以分发Web搜索引擎的功能组件-从爬网和索引文档到查询处理–使用搜索引擎的用户(或对等用户)在整个网络中 udf。这种构建 udan IR系统的策略提出了一些效率和有效性挑战,这些挑战在过去的工作中已经确定。因此,本论文通过改进P2P Web搜索的查询处理和相关性评分方面,为提高P2P-IR的有效性做出了一些贡献。联合搜索系统是一种分布式信息检索模型,可以路由P用户的信息需求,以查询的形式表达,以分配资源并将检索到的结果列表合并到最终列表中。 P2P-IR网络是路由查询和参与对等节点之间的合并结果中的联合搜索的一种形式。查询通过传播的节点传播,以命中最可能包含相关文档的对等方,然后将检索到的结果列表沿从相关对等方到查询初始化程序(或客户)的路径上的不同点合并。但是,P2P-IR网络中的查询路由被认为是P2P-IR网络中的主要挑战和关键部分之一。因为在执行查询路由时,相关的对等方可能会因低质量的对等方选择而丢失,从而不可避免地导致检索结果的有效性降低。这激发了本论文的研究和提出查询路由技术,以提高此类网络的检索质量。 ud基于聚类的半结构化P2P-IR网络利用聚类假设将对等体组织到相似的语义聚类中,在每个语义聚类中管理此类语义聚类由超级同行。本文构建了三个半结构化的P2P-IR模型,并研究了它们的检索效果。我还利用超级对等层的集群质心作为从协作对等点收集的内容表示,提出了一种称为反向对等集群索引(IPI)的查询路由方法,该方法模拟集中式语料库的常规反向索引来组织对等项的统计数据。 。结果表明,与基线方法相比,检索质量具有竞争力。此外,我研究了使用常规信息检索模型作为对等方选择方法的适用性,其中每个对等方都可以视为文档的一个大文档。实验评估显示了比较和重要的结果,并说明了文档检索方法对于同级非选择非常有效,从而使文档和同级之间恢复了相似性。另外,利用学习排名(LtR)算法来构建学习的分类器,以在超级对等级别上进行对等排名。实验显示,使用最新的资源选择方法可获得明显的结果,并且与基于分类的方法相比,具有竞争性的结果。 ud最后,我提出了基于信誉的查询路由方法,该方法利用了在特定项目中提供反馈的想法。社会社区网络并对其进行管理以用于将来的决策。当用户单击或下载最终排名列表中的文档作为隐式反馈时,系统会监视用户的行为,并挖掘给定的信息以构建基于信誉的数据结构。数据结构用于对同级评分,然后对它们进行排名以进行查询路由。我进行了一系列实验来涵盖各种场景,包括嘈杂的反馈信息(即,对不相关的文档提供积极的反馈),以检查基于信誉的方法的鲁棒性。经验评估显示,在几乎所有测量指标中均取得了显着成果,与基线方法相比,大约改善了56%以上。因此,基于结果,如果选择一种技术,基于信誉的方法显然是自然选择,也可以部署在任何P2P网络上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号