首页> 外文学位 >Usage meets link analysis: Towards improving intranet and site specific search via usage statistics.
【24h】

Usage meets link analysis: Towards improving intranet and site specific search via usage statistics.

机译:用法符合链接分析:通过用法统计信息来改进Intranet和特定于站点的搜索。

获取原文
获取原文并翻译 | 示例

摘要

This thesis explores the possibility of incorporating usage statistics to improve ranking quality in site specific and intranet search engines. A number of usage based ranking approaches are introduced including a PageRank extension, Usage aware PageRank (UPR), an extension to HITS (UNITS), and a naive approach that uses the number of visits to pages as a quality measure. These methods are compared against each other and against two major link analysis approaches: PageRank and HITS. Weighting schemes that take into account the probability of visiting a page directly (by typing or via bookmarks), as well as the relative probability of following a particular link from a given page are explored. Both of these probabilities can be approximated from usage logs. Experimental results are carried out using a site specific search engine incorporating the above methods, using 6+ months of usage logs centered around the snapshot. The parameter space for UPR and UNITS are sampled to examine the effects of varying usage emphasis factors. Experiments suggest that one of the proposed methods, UPR is promising and has a number of desirable properties, generalizing PageRank and inheriting basic PageRank properties. It is also stable and flexible. Usage based signals such as UPR, can be especially useful in an intranet/site specific search setting, where documents tend to be poorly connected compared to the Web, but inherently, there is no or very little incentive for spamming.
机译:本文探讨了合并使用情况统计信息以提高站点特定和Intranet搜索引擎的排名质量的可能性。引入了许多基于使用情况的排名方法,包括PageRank扩展,使用意识的PageRank(UPR),对HITS的扩展(UNITS)以及一种将页面访问次数作为质量度量的幼稚方法。将这些方法相互比较,并与两种主要的链接分析方法进行比较:PageRank和HITS。探索了加权方案,该方案考虑了直接访问页面(通过键入或通过书签)的概率以及从给定页面访问特定链接的相对概率。这两个概率都可以从使用情况日志中近似得出。实验结果是使用结合了上述方法的特定于站点的搜索引擎进行的,使用了以快照为中心的6个月以上的使用日志。对UPR和UNITS的参数空间进行了抽样,以检查各种使用重点因素的影响。实验表明,提出的一种方法UPR是有前途的,并且具有许多理想的属性,可以泛化PageRank并继承基本的PageRank属性。它也是稳定和灵活的。诸如UPR之类的基于使用的信号在Intranet /站点特定的搜索设置中尤其有用,在该设置中,文档与Web相比连接性较差,但是从本质上讲,没有或很少有垃圾邮件诱因。

著录项

  • 作者

    Oztekin, Bilgehan Uygar.;

  • 作者单位

    University of Minnesota.;

  • 授予单位 University of Minnesota.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2005
  • 页码 87 p.
  • 总页数 87
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号