首页> 外文会议>International ACM SIGIR conference on research development in information retrieval >Content-based Retrieval for Heterogeneous Domains: Domain Adaptation by Relative Aggregation Points
【24h】

Content-based Retrieval for Heterogeneous Domains: Domain Adaptation by Relative Aggregation Points

机译:基于内容的异构域检索:相对聚集点的域适应

获取原文

摘要

We introduce the problem of domain adaptation for content-based retrieval and propose a domain adaptation method based on relative aggregation points (RAPs). Content-based retrieval including image retrieval and spoken document retrieval enables a user to input examples as a query, and retrieves relevant data based on the similarity to the examples. However, input examples and relevant data can be dissimilar, especially when domains from which the user selects examples and from which the system retrieves data are different. In content-based geographic object retrieval, for example, suppose that a user who lives in Beijing visits Kyoto, Japan, and wants to search for relatively inexpensive restaurants serving popular local dishes by means of a content-based retrieval system. Since such restaurants in Beijing and Kyoto are dissimilar due to the difference in the average cost and areas' popular dishes, it is difficult to find relevant restaurants in Kyoto based on examples selected in Beijing. We propose a solution for this problem by assuming that RAPs in different domains correspond, which may be dissimilar but play the same role. A RAP is defined as the expectation of instances in a domain that are classified into a certain class, e.g. the most expensive restaurant, average restaurant, and restaurant serving the most popular dishes. Our proposed method constructs a new feature space based on RAPs estimated in each domain and bridges the domain difference for improving content-based retrieval in heterogeneous domains. To verify the effectiveness of our proposed method, we evaluated various methods with a test collection developed for content-based geographic object retrieval. Experimental results show that our proposed method achieved significant improvements over baseline methods. Moreover, we observed that the search performance of content-based retrieval in heterogeneous domains was significantly lower than that in homogeneous domains. This finding suggests that relevant data for the same search intent depend on the search context, that is, the location where the user searches and the domain from which the system retrieves data.
机译:我们介绍了基于内容的检索的领域适应问题,并提出了一种基于相对聚集点(RAP)的领域适应方法。基于内容的检索(包括图像检索和语音文档检索)使用户能够输入示例作为查询,并基于与示例的相似性来检索相关数据。但是,输入示例和相关数据可能不同,尤其是当用户从中选择示例的域和系统从中检索数据的域不同时。例如,在基于内容的地理对象检索中,假设居住在北京的用户访问日本的京都,并希望通过基于内容的检索系统来搜索相对便宜的餐馆,这些餐馆供应流行的当地美食。由于北京和京都的这类餐馆由于平均价格和当地受欢迎的菜肴的不同而有所不同,因此很难根据在北京选择的例子来找到京都的相关餐馆。通过假定不同域中的RAP对应,我们提出了一个解决方案,该方案可能不相似,但起着相同的作用。 RAP定义为域中被归类为某个类别(例如,最昂贵的餐厅,普通餐厅和提供最受欢迎菜肴的餐厅。我们提出的方法基于每个域中估计的RAP构造了一个新的特征空间,并弥合了域差异,以改进异构域中基于内容的检索。为了验证我们提出的方法的有效性,我们使用针对基于内容的地理对象检索开发的测试集合评估了各种方法。实验结果表明,我们提出的方法比基线方法有了显着的改进。此外,我们观察到在异类域中基于内容的检索的搜索性能显着低于同类域。该发现表明,针对同一搜索意图的相关数据取决于搜索上下文,即用户搜索的位置以及系统从中检索数据的域。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号