首页> 外文期刊>Pattern Analysis and Machine Intelligence, IEEE Transactions on >What Can Pictures Tell Us About Web Pages? Improving Document Search Using Images
【24h】

What Can Pictures Tell Us About Web Pages? Improving Document Search Using Images

机译:图片可以告诉我们有关网页的哪些内容?使用图像改善文档搜索

获取原文
获取原文并翻译 | 示例

摘要

Traditional Web search engines do not use the images in the HTML pages to find relevant documents for a given query. Instead, they typically operate by computing a measure of agreement between the keywords provided by the user and only the text portion of each page. In this paper we study whether the of the pictures appearing in a Web page can be used to enrich the semantic description of an HTML document and consequently boost the performance of a keyword-based search engine. We present a Web-scalable system that exploits a pure text-based search engine to find an initial set of candidate documents for a given query. Then, the candidate set is reranked using visual information extracted from the images contained in the pages. The resulting system retains the computational efficiency of traditional text-based search engines with only a small additional storage cost needed to encode the visual information. We test our approach on one of the TREC Million Query Track benchmarks where we show that the exploitation of visual content yields improvement in accuracies for two distinct text-based search engines, including the system with the best reported performance on this benchmark. We further validate our approach by collecting document relevance judgements on our search results using Amazon Mechanical Turk. The results of this experiment confirm the improvement in accuracy produced by our image-based reranker over a pure text-based system.
机译:传统的Web搜索引擎不使用HTML页面中的图像来查找给定查询的相关文档。相反,它们通常通过计算用户提供的关键字与每个页面的仅文本部分之间的一致程度来进行操作。在本文中,我们研究了出现在网页中的图片是否可用于丰富HTML文档的语义描述,从而提高基于关键字的搜索引擎的性能。我们提供了一个Web可扩展系统,该系统利用基于纯文本的搜索引擎来查找给定查询的初始候选文档集。然后,使用从页面中包含的图像中提取的视觉信息对候选集进行排名。最终的系统保留了传统的基于文本的搜索引擎的计算效率,而仅需很小的额外存储成本即可对视觉信息进行编码。我们在TREC Million Query Track基准之一上测试了我们的方法,在该基准中我们表明,对两个不同的基于文本的搜索引擎(包括在该基准上报告性能最佳的系统),视觉内容的利用可提高准确性。我们通过使用Amazon Mechanical Turk对搜索结果收集文档相关性判断来进一步验证我们的方法。该实验的结果证实了我们的基于图像的重排程序比基于纯文本的系统所产生的准确性有所提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号