首页> 外文学位 >A holistic, similarity-based approach for personalized ranking in web databases.
【24h】

A holistic, similarity-based approach for personalized ranking in web databases.

机译:一种基于整体,基于相似度的方法,用于在Web数据库中进行个性化排名。

获取原文
获取原文并翻译 | 示例

摘要

With the advent of the Web, the notion of "information retrieval" has acquired a completely new connotation and currently encompasses several disciplines ranging from traditional forms of text and data retrieval in unstructured and structured repositories to retrieval of static and dynamic information from the contents of the surface and deep Web. From the point of view of the end user, a common thread that binds all these areas is to support appropriate alternatives for allowing users to specify their intent (i.e., the user input) and displaying the resulting output ranked in an order relevant to the users.;In the context of specifying an user's intent, the paradigms of querying as well as searching have served well, as the staple mechanisms in the process of information retrieval over structured and unstructured repositories. Processing queries over known, structured repositories (e.g., traditional and Web databases) has been well-understood, and search has become ubiquitous when it comes to unstructured repositories (e.g., document collections and the surface Web). Furthermore, searching structured repositories has been explored to a limited extent. However, there is not much work in querying unstructured sources which, we believe is the next step in performing focused retrievals.;Correspondingly, one of the important contributions of this dissertation is a novel semantic-guided approach, termed Query-By-Keywords (or QBK), to generate queries from search-like inputs for unstructured repositories. Instead of burdening the user with schema details, this approach utilizes pre-discovered semantic information in the form of taxonomies, relationship of keywords based on context, and attribute & operator compatibility to generate query skeletons that are subsequently transformed into queries. Additionally, progressive feedback from users is used to further improve the accuracy of these query skeletons. The overall focus thus, is to propose an alternative paradigm for the generation of queries on unstructured repositories using as little information from the user as possible.;Irrespective of the template for intent specification (i.e., either a search or a query request), the number of results typically returned in response to such intents, are often, extremely large. This is particularly true in the context of the deep Web where a large number of results are returned for queries on Web databases and choosing the most useful answer(s) becomes a tedious and time-consuming task. Most of the time the user is not interested in all answers; instead s/he would prefer those results, that are ranked based on her/his interests, characteristics, and past usage, to be displayed before the rest. Furthermore, these preferences vary as users and queries change.;Accordingly, in this dissertation, we propose a novel similarity -based framework for supporting user- and query-dependent ranking of query results in Web databases. This framework is based on the intuition that---for the results of a given query, similar users display comparable ranking preferences, and a user displays analogous ranking preferences over results of similar queries. Consequently, this framework is supported by two novel and comprehensive models of: (1) Query Similarity, and (2) User Similarity, proposed as part of this work. In addition, this ranking framework relies on the availability of a small yet representative set of ranking functions collected across several user-query pairs, in order to rank the results of a given user query at runtime. Appropriately, we address the subsequent problem of establishing a relevant workload of ranking functions that assists the similarity model in the best possible way to achieve the goal of user- and query-dependent ranking. Furthermore, we advance a novel probabilistic learning model that infers individual ranking functions (for this workload) based on the implicit browsing behavior displayed by users. We establish the effectiveness of this complete ranking framework by experimentally evaluating it on Google Base's vehicle and real estate databases with the aid of Amazon's Mechanical Turk users.
机译:随着Web的出现,“信息检索”的概念已获得了全新的含义,目前涵盖了从非结构化和结构化存储库中的传统形式的文本和数据检索到从Web内容中检索静态和动态信息的多种学科表面和深层的Web。从最终用户的角度来看,绑定所有这些区域的通用线程将支持适当的替代方案,以允许用户指定其意图(即,用户输入)并显示按与用户相关的顺序排列的结果输出在指定用户的意图的上下文中,作为在结构化和非结构化存储库上进行信息检索过程中的主要机制,查询和搜索的范式已很好地发挥了作用。在已知的结构化存储库(例如,传统数据库和Web数据库)上处理查询已广为人知,当涉及到非结构化存储库(例如,文档集合和表面Web)时,搜索变得无处不在。此外,在有限程度上探索了搜索结构化存储库。但是,查询非结构化源的工作并不多,我们认为这是进行集中检索的下一步。相应地,本论文的重要贡献之一是一种新颖的语义引导方法,称为Query-By-Keywords(或QBK),以从类似搜索的输入中生成针对非结构化存储库的查询。这种方法不会以分类细节,基于上下文的关键字关系以及属性和运算符兼容性的形式使用预先发现的语义信息,而不是使用户负担架构细节,从而生成查询框架,随后将其转换为查询。此外,使用来自用户的渐进式反馈可进一步提高这些查询框架的准确性。因此,总的重点是提出一种替代范例,以使用尽可能少的来自用户的信息在非结构化存储库上生成查询。不论意图规范的模板(即搜索或查询请求)如何,响应于这种意图而通常返回的结果的数量通常非常大。在深层Web的环境中尤其如此,在深层Web中,要返回大量结果以查询Web数据库,选择最有用的答案成为一项繁琐且耗时的任务。大多数时候,用户对所有答案都不感兴趣。取而代之的是,他/她希望将根据她/他的兴趣,特征和过去使用情况进行排名的结果显示在其余结果之前。此外,这些偏好随着用户和查询的变化而变化。因此,本文提出了一种新颖的基于相似度的框架,用于支持Web数据库中用户和查询相关的查询结果排名。该框架基于以下直觉:对于给定查询的结果,相似的用户显示可比的排名首选项,并且用户显示相似查询的结果类似的排名首选项。因此,此框架由两个新颖而全面的模型支持:(1)查询相似性,以及(2)用户相似性,作为该工作的一部分。此外,此排名框架依赖于在几个用户查询对之间收集的一组小而有代表性的排名函数的可用性,以便在运行时对给定用户查询的结果进行排名。适当地,我们解决了随后的问题,即建立相关的排名功能工作量,以最佳方式协助相似性模型实现依赖用户和查询的排名目标。此外,我们提出了一种新颖的概率学习模型,该模型基于用户显示的隐式浏览行为来推断个人排名功能(针对此工作负载)。我们通过在Amazon Mechanical Turk用户的帮助下在Google Base的车辆和房地产数据库上进行实验性评估,来建立此完整排名框架的有效性。

著录项

  • 作者

    Telang, Aditya.;

  • 作者单位

    The University of Texas at Arlington.;

  • 授予单位 The University of Texas at Arlington.;
  • 学科 Information Technology.;Computer Science.
  • 学位 Ph.D.
  • 年度 2011
  • 页码 196 p.
  • 总页数 196
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:44:23

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号