首页> 外文会议>Artificial Intelligence and Applications >THOROUGH INDEXING OF IMAGES ON THE WORLD WIDE WEB
【24h】

THOROUGH INDEXING OF IMAGES ON THE WORLD WIDE WEB

机译:在互联网上彻底索引图像

获取原文

摘要

The diversity of the World Wide Web requires intelligent automated tools to find useful information. We describe a Web "crawler" and caption filter MARIE-4 that searches the Web to find text likely to be image captions and its associated image objects. Rather than examining a few features like existing systems, it uses broad set of criteria including some novel ones to yield higher recall than competing systems, which generally focus on high precision. We tested these criteria in careful experiments that extracted 8140 caption candidates for 4585 representative images, and quantified for the first time the relative value of several kinds of clues for captions. The crawler is self-improving in that it obtains from experience further statistics as positive and negative clues. We index the results found by the crawler and provide a user interface. We have done demonstration implementations of a Web search engine for all 667,573 publicly-accessible U.S. Navy Web images and all 301,178 U.S. Army Web images.
机译:万维网的多样性需要智能的自动化工具来查找有用的信息。我们描述了一个Web“爬网程序”和字幕过滤器MARIE-4,它可以在Web上搜索以查找可能是图像字幕及其相关图像对象的文本。它没有研究诸如现有系统之类的一些功能,而是使用了广泛的标准集,其中包括一些新颖的准则,从而产生了比竞争系统更高的召回率,而竞争系统通常只注重高精度。我们在仔细的实验​​中测试了这些标准,为4585个代表性图像提取了8140个字幕候选者,并首次量化了几种字幕提示的相对价值。爬网程序是自我完善的,因为它可以从经验中获得更多的积极和消极线索的统计数据。我们对搜寻器发现的结果建立索引并提供一个用户界面。我们已经为所有667,573个可公开访问的美国海军网络图像和所有301,178个美国陆军网络图像完成了Web搜索引擎的演示实现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号