首页> 外文学位 >Federated Search for Heterogeneous Environments.
【24h】

Federated Search for Heterogeneous Environments.

机译:联合搜索异构环境。

获取原文
获取原文并翻译 | 示例

摘要

In information retrieval, federated search is the problem of automatically searching across multiple distributed collections or resources. It is typically decomposed into two subsequent steps: deciding which resources to search ( resource selection) and deciding how to combine results from multiple resources into a single presentation (results merging). Federated search occurs in different environments. This dissertation focuses on an environment that has not been deeply investigated in prior work.;The growing heterogeneity of digital media and the broad range of user information needs that occur in today's world have given rise to a multitude of systems that specialize on a specific type of search task. Examples include search for news, images, video, local businesses, items for sale, and even social-media interactions. In the Web search domain, these specialized systems are called verticals and one important task for the Web search engine is the prediction and integration of relevant vertical content into the Web search results. This is known as aggregated web search and is the main focus on this dissertation.;Providing a single-point of access to all these diverse systems requires federated search solutions that can support result-type and retrieval-algorithm heterogeneity. This type of heterogeneity violates major assumptions made by state-of-the-art resource selection and results merging methods.;While existing resource selection methods derive predictive evidence exclusively from sampled resource content, the approaches proposed in this dissertation draw on machine learning as a means to easily integrate various different types of evidence. These include, for example, evidence derived from (sampled) vertical content, vertical query-traffic, click-through information, and properties of the query string. In order to operate in a heterogeneous environment, we focus on methods that can learn a vertical-specific relationship between features and relevance. We also present methods that reduce the need for human-produced training data.;Existing results merging methods formulate the task as score normalization. In a more heterogeneous environment, however, combining results into a single presentation requires satisfying a number of layout constraints. The dissertation proposes a novel formulation of the task: block ranking. During block-ranking, the objective is to rank sequences of results that must appear grouped together (vertically or horizontally) in the final presentation. Based on this formulation, the dissertation proposes and empirically validates a cost-effective methodology for evaluating aggregated web search results. Finally, it proposes the use of machine learning methods for the task of block-ranking.
机译:在信息检索中,联合搜索是跨多个分布式集合或资源自动搜索的问题。通常将其分解为两个后续步骤:确定要搜索的资源(资源选择),以及确定如何将来自多个资源的结果合并到单个表示中(结果合并)。联合搜索发生在不同的环境中。本文的重点是在先前的工作中尚未进行深入研究的环境。;数字媒体的异质性不断增长,当今世界上出现的广泛的用户信息需求已导致产生了许多专门针对特定类型的系统搜索任务。示例包括搜索新闻,图像,视频,本地企业,待售商品,甚至社交媒体互动。在Web搜索领域中,这些专门的系统称为“垂直行业”,Web搜索引擎的一项重要任务是将相关垂直内容预测并集成到Web搜索结果中。这被称为聚合Web搜索,并且是本文的主要重点。提供对所有这些不同系统的单点访问需要能够支持结果类型和检索算法异质性的联合搜索解决方案。这种类型的异质性违反了最新资源选择和结果合并方法所做出的主要假设。虽然现有资源选择方法仅从采样的资源内容中获得预测证据,但本文提出的方法将机器学习作为一种方法轻松整合各种不同类型证据的手段。例如,这些包括从(采样的)垂直内容,垂直查询流量,点击信息以及查询字符串的属性得出的证据。为了在异构环境中运行,我们专注于可以学习特征和相关性之间垂直特定关系的方法。我们还提出了减少对人为训练数据的需求的方法。现有的结果合并方法将任务表述为分数归一化。但是,在更加异构的环境中,将结果组合到单个表示中需要满足许多布局约束。论文提出了一种新颖的任务表述:区块排名。在块排序期间,目标是对必须在最终表示中组合在一起(垂直或水平)出现的结果序列进行排序。基于这种表述,本文提出并在经验上验证了一种用于评估汇总Web搜索结果的经济有效的方法。最后,它提出将机器学习方法用于块排名任务。

著录项

  • 作者

    Arguello, Jaime.;

  • 作者单位

    Carnegie Mellon University.;

  • 授予单位 Carnegie Mellon University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2011
  • 页码 171 p.
  • 总页数 171
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号