首页> 外文学位 >Improving search in unstructured peer-to-peer systems.
【24h】

Improving search in unstructured peer-to-peer systems.

机译:改进非结构化对等系统中的搜索。

获取原文
获取原文并翻译 | 示例

摘要

Peer-to-peer (P2P) systems have been studied as alternatives to centralized systems for their ability to federate the processing, storage and network resources available among a set of independently owned peers for the common good. Various distributed applications have been designed on top of a P2P infrastructure. For example, systems such as PPLive and Joost use peers for streaming videos while systems such as Gnutella and BitTorrent use peers for distributing files.;In this thesis, I focus on P2P file sharing systems. Peers in file-sharing systems form an application-level distributed overlay to manage the communication between peers. Peers then use this overlay to locate files available from other peers. The effectiveness of locating files in unstructured P2P systems depended on the ease with which other peers could be reached. Prior research efforts created overlays that had good network behavior. However, these prior research efforts did not fully investigate the nature of query resolution. In general, queries for objects were resolved by matching terms in the query to terms in the object annotations. The performance of file-sharing systems fundamentally depended not only on the ability to route queries to various peers, but also on whether the query itself could be resolved.;We performed an extensive analysis of the annotations of the files stored in the Gnutella and iTunes P2P systems as well as queries that users were issuing to locate objects in Gnutella. Despite differences in how objects were annotated between Gnutella and iTunes, the overall popularity distribution of objects was similar. I observed a fundamental mismatch between the way that objects were named and the way that they were searched in Gnutella; terms that were popular in the file annotations were not popular in the queries. This mismatch meant that many search queries were unlikely to find matching objects in the system. I showed that approximately 55% of the queries in Gnutella had no matching files. Optimizing the overlays or the search mechanisms itself was unlikely to resolve the mismatch between query terms and file annotations. These results remained consistent across my analysis of the P2P system over several years, suggesting that both the users query behavior and the way that objects were annotated had stabilized.;In this thesis, I addressed the problem of unresolved queries due to query term and object annotation mismatches by transforming the original query and selectively removing terms from the original query in order to broaden its scope. The challenge was to choose query terms without analyzing user intent or performing semantic analysis of the queries and objects. I removed query terms based on observed data from a small number of peers ( 1%) before issuing the modified query against the entire P2P system. Using captured queries and files shared by peers in Gnutella, I showed that my approach improved the maximum success rates from about 45% to over 85% of the queries. We also performed experimental analysis of the query term selection algorithm on a live Gnutella network. We showed that over 61% of the re-written queries were resolved successfully by the Gnutella network.
机译:对等(P2P)系统已经作为集中式系统的替代品进行了研究,因为它们具有为共同利益而联合在一组独立拥有的对等方之间可用的处理,存储和网络资源的能力。在P2P基础架构之上已设计了各种分布式应用程序。例如,PPLive和Joost等系统使用对等点传输视频,而Gnutella和BitTorrent等系统使用对等点分发文件。在本文中,我主要研究P2P文件共享系统。文件共享系统中的对等方形成应用程序级分布式覆盖,以管理对等方之间的通信。对等方然后使用此覆盖来查找其他对等方可用的文件。在非结构化P2P系统中定位文件的有效性取决于可以达到其他同级对象的难易程度。先前的研究工作创建了具有良好网络行为的覆盖图。但是,这些先前的研究工作并未完全调查查询解析的性质。通常,通过将查询中的术语与对象注释中的术语进行匹配来解决对对象的查询。文件共享系统的性能从根本上不仅取决于将查询路由到各个对等方的能力,还取决于查询本身是否可以解决。我们对Gnutella和iTunes中存储的文件的批注进行了广泛的分析。 P2P系统以及用户发出的在Gnutella中定位对象的查询。尽管Gnutella和iTunes之间在对象注释方式上存在差异,但是对象的总体受欢迎程度分布是相似的。我观察到对象的命名方式与在Gnutella中搜索对象的方式之间存在根本的不匹配。在文件注释中流行的术语在查询中并不流行。这种不匹配意味着许多搜索查询不太可能在系统中找到匹配的对象。我发现Gnutella中大约55%的查询没有匹配的文件。优化覆盖图或搜索机制本身不可能解决查询词和文件注释之间的不匹配问题。这些结果在我过去几年对P2P系统的分析中保持一致,这表明用户的查询行为和注释对象的方式都已稳定。在本论文中,我解决了由于查询项和对象而导致的未解决查询问题。注释不匹配,方法是转换原始查询并有选择地从原始查询中删除字词,以扩大其范围。挑战在于选择查询词时不分析用户意图或对查询和对象进行语义分析。在针对整个P2P系统发出修改后的查询之前,我根据少量对等点(<1%)的观察数据删除了查询词。通过使用Gnutella中的对等方共享的捕获查询和文件,我证明了我的方法将查询的最大成功率从大约45%提高到了85%以上。我们还对实时Gnutella网络上的查询字词选择算法进行了实验分析。我们显示Gnutella网络成功解决了超过61%的重写查询。

著录项

  • 作者单位

    University of Notre Dame.;

  • 授予单位 University of Notre Dame.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2008
  • 页码 86 p.
  • 总页数 86
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号