首页> 外文学位 >Riemann space model and similarity-based Web retrieval.
【24h】

Riemann space model and similarity-based Web retrieval.

机译:黎曼空间模型和基于相似度的Web检索。

获取原文
获取原文并翻译 | 示例

摘要

Similarity-based matching is widely used in the vector space model. However, the widespread adoption of similarity-based matching is hampered by disagreements over how similarity measures should be constructed and how large databases should be indexed so the similarity matching is even possible. This thesis intends to overcome these hindrances and to establish a theoretical basis and implementation guidelines for applying similarity-based matching in Web retrieval.; The thesis analyzes the vector space model and shows that Web space would be modeled more exactly as a curved space rather than as a Euclidean space. Based on this, the thesis claims that it is inappropriate to attempt to apply a single similarity/dissimilarity measure globally on Web space. The thesis proposes a Riemann space model that explains previously unexplained phenomena. In the Riemann space model, dissimilarity functions are integrated into a single form of geodesic distances, which can be locally computed in a uniform formula. To some extent, this answers the long-existing open problem of identifying conditions for the use of a particular similarity/dissimilarity measure.; According to the theory of the Riemann space model, we propose a multi-stage approach that combines exact matching and partial matching in the design of new Web retrieval systems. In this approach, a retrieval system first forms a neighborhood of a query. This can be done using exact matching. Then in the chosen neighborhood, more complicated similarity-based matching is performed. The documents are ranked according to their geodesic distances to the query. This is equivalent to using a ranking function specially designed for the given neighborhood. Since the similarity-based matching is performed only in a neighborhood, the computational cost involved in the search process would be reduced. The Riemann space model provides a sound theoretical basis for this multi-stage approach.; As a demonstration of application, we designed and implemented a personal Web retrieval (PWR) system. Different from current search engines, subject trees, and metasearch engines, this system is a client side program. It works like a personal secretary. It reads Web documents, ranks them according to their geodesic distances to the query, and also considers the user's general search interests. It can be viewed as a prototype of intelligent Web retrieval systems.
机译:向量空间模型中广泛使用基于相似度的匹配。但是,基于相似性的匹配的广泛采用受到关于如何构造相似性度量以及应为大型数据库建立索引的分歧而受到阻碍,因此甚至可能进行相似性匹配。本文旨在克服这些障碍,为在网络检索中应用基于相似度的匹配建立理论基础和实施指南。本文对向量空间模型进行了分析,结果表明,Web空间将更准确地建模为弯曲空间而不是欧几里得空间。基于此,本文主张在Web空间上全局应用单个相似度/不相似度度量是不合适的。本文提出了一个黎曼空间模型,该模型解释了以前无法解释的现象。在Riemann空间模型中,相异函数被集成到测地距离的单个形式中,该距离可以通过统一公式局部计算。在某种程度上,这回答了长期存在的开放性问题,即确定使用特定相似性/相异性度量的条件。根据Riemann空间模型的理论,我们提出了一种在新Web检索系统的设计中结合精确匹配和部分匹配的多阶段方法。在这种方法中,检索系统首先形成查询的 neighborhood 。可以使用精确匹配来完成。然后,在选定的邻域中,执行更复杂的基于相似度的匹配。根据文档到查询的测地距离对文档进行排名。这等效于使用专门为给定邻域设计的排名功能。由于仅在附近执行基于相似度的匹配,因此将减少搜索过程中涉及的计算成本。黎曼空间模型为这种多阶段方法提供了良好的理论基础。作为应用程序的演示,我们设计并实现了个人Web检索(PWR)系统。与当前的搜索引擎,主题树和元搜索引擎不同,该系统是一个客户端程序。它像私人秘书一样工作。它读取Web文档,根据它们到查询的测地距离对它们进行排名,并考虑用户的一般搜索兴趣。可以将其视为智能Web检索系统的原型。

著录项

  • 作者

    Wang, Zhiwei.;

  • 作者单位

    The University of Regina (Canada).;

  • 授予单位 The University of Regina (Canada).;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2001
  • 页码 167 p.
  • 总页数 167
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号