...
首页> 外文期刊>ACM SIGIR FORUM >Federated Search in Heterogeneous Environments
【24h】

Federated Search in Heterogeneous Environments

机译:异构环境中的联合搜索

获取原文
获取原文并翻译 | 示例
           

摘要

In information retrieval, federated search is the problem of automatically searching across multiple distributed collections or resources. It is typically decomposed into two subsequent steps: deciding which resources to search {resource selection) and deciding how to combine results from multiple resources into a single presentation {results merging). Federated search occurs in different environments. This dissertation focuses on an environment that has not been deeply investigated in prior work.The growing heterogeneity of digital media and the broad range of user information needs that occur in today's world have given rise to a multitude of systems that specialize on a specific type of search task. Examples include search for news, images, video, local businesses, items for sale, and even social-media interactions. In the Web search domain, these specialized systems are called verticals and one important task for the Web search engine is the prediction and integration of relevant vertical content into the Web search results. This is known as aggregated web search and is the main focus on this dissertation.rnProviding a single-point of access to all these diverse systems requires federated search solutions that can support result-type and retrieval-algorithm heterogeneity. This type of heterogeneity violates major assumptions made by state-of-the-art resource selection and results merging methods and motivates the development of new techniques.rnWhile existing resource selection methods derive evidence exclusively from sampled resource content, the approaches proposed in this dissertation draw on machine learning as a means to easily integrate various different types of evidence. These include, for example, evidence derived from (sampled) vertical content, vertical query-traffic, click-through information, and properties of the query string. In order to operate in a heterogeneous environment, we focus on methods that can learn a vertical-specific relationship between features and relevance. We also present methods that reduce the need for human-produced training data. In particular, we focus on the situation where we have vertical-relevance judgments for some verticals and want to learn a predictive model for a vertical associated with no training data.rnExisting results merging methods formulate the task as score normalization. In a more heterogeneous environment, however, combining results into a single presentation requires satisfying a number of layout constraints. The dissertation proposes a novel formulation of the task: block ranking. During block-ranking, the objective is to rank sequences of results that must appear grouped together (vertically or horizontally) in the final presentation. Based on this formulation, the dissertation proposes and empirically validates a cost-effective methodology for evaluating aggregated web search results. Finally, it proposes the use of machine learning methods for the task of block-ranking.
机译:在信息检索中,联合搜索是跨多个分布式集合或资源自动搜索的问题。通常将其分解为两个后续步骤:确定要搜索的资源(资源选择),以及确定如何将来自多个资源的结果合并到单个表示中(结果合并)。联合搜索发生在不同的环境中。本文的重点是在先前的工作中尚未深入研究的环境。当今世界,数字媒体的异质性不断增长,用户信息需求日益广泛,因此产生了许多专门针对特定类型的网络的系统。搜索任务。示例包括搜索新闻,图像,视频,本地企业,待售商品,甚至社交媒体互动。在Web搜索领域中,这些专门的系统称为“垂直行业”,Web搜索引擎的一项重要任务是将相关垂直内容预测并集成到Web搜索结果中。这被称为聚合Web搜索,并且是本论文的主要重点。提供对所有这些不同系统的单点访问需要能够支持结果类型和检索算法异质性的联合搜索解决方案。这种类型的异质性违反了最新资源选择和结果合并方法所做出的主要假设,并激发了新技术的发展。尽管现有资源选择方法仅从采样的资源内容中得出证据,但本文提出的方法是机器学习作为一种轻松整合各种不同类型证据的手段。例如,这些包括从(采样的)垂直内容,垂直查询流量,点击信息以及查询字符串的属性得出的证据。为了在异构环境中运行,我们专注于可以学习特征和相关性之间垂直特定关系的方法。我们还提出了减少对人为训练数据的需求的方法。特别是,我们专注于某些垂直行业具有垂直相关性判断的情况,并希望学习没有培训数据的垂直行业的预测模型。现有结果合并方法将任务表述为分数归一化。但是,在更加异构的环境中,将结果组合到单个表示中需要满足许多布局约束。论文提出了一种新颖的任务表述:区块排名。在块排序期间,目标是对必须在最终表示中组合在一起(垂直或水平)出现的结果序列进行排序。基于这种表述,本文提出并在经验上验证了一种用于评估汇总Web搜索结果的经济有效的方法。最后,它提出将机器学习方法用于块排名任务。

著录项

  • 来源
    《ACM SIGIR FORUM》 |2012年第1期|p.78-79|共2页
  • 作者

    Jaime Arguello;

  • 作者单位

    School of Information and Library Science University of North Carolina Chapel Hill, NC 27599 USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号