【24h】

Introducing a word vector space model and improve image features to web image-gathering

机译:引入词向量空间模型并改进Web图像收集的图像功能

获取原文
获取原文并翻译 | 示例
       

摘要

For realizing a recognition system for real-world images, recently, many methods by which a large number of images are required as learning images are proposed. It is, however, hard to collect a large number of learning images, so that many conventional studies restricted their target objects to special objects such as human faces and cars. Then, in this paper, we propose a system that can gather a large number of real-world images from the World-Wide Web (WWW) automatically. We realize the system by making three kinds of improvements for the Image Collector, which is an Web-image-gathering system we proposed in the past. (1) We use word vectors of HTML files embedding image files as well as image feature vectors extracted from image files. (2) Instead of simple color histogram, we use Earth Mover's Distance for computing the similarity between images. (3) For gathering much more images, we extract words from all the HTML files embedding final output images, select top ten words about the frequency that words appear in the HTML files, and adds them to query keywords to text-based Web search engines.
机译:为了实现用于现实世界图像的识别系统,近来,提出了许多方法,通过这些方法需要大量图像作为学习图像。但是,很难收集大量的学习图像,因此许多常规研究将目标对象限制在诸如人脸和汽车之类的特殊对象上。然后,在本文中,我们提出了一个可以自动从万维网(WWW)收集大量真实世界图像的系统。我们通过对Image Collector进行三种改进来实现该系统,这是我们过去提出的Web图像收集系统。 (1)我们使用嵌入图像文件的HTML文件的单词向量以及从图像文件中提取的图像特征向量。 (2)我们使用地球移动器的距离而不是简单的颜色直方图来计算图像之间的相似度。 (3)为了收集更多图像,我们从嵌入最终输出图像的所有HTML文件中提取单词,选择关于单词在HTML文件中出现频率的前十个单词,并将它们添加到基于文本的Web搜索引擎的查询关键字中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号