首页> 外文会议>Pacific-Rim conference on multimedia >Semantic R-CNN for Natural Language Object Detection
【24h】

Semantic R-CNN for Natural Language Object Detection

机译:用于自然语言对象检测的语义R-CNN

获取原文

摘要

In this paper, we present a simple and effective framework for natural language object detection, to localize a target within an image based on description of the target. The method, called semantic R-CNN, extends RPN (Region Proposal Network) [1] by adding LSTM [20J module for processing natural language query text. LSTM [20] module take encoded query text and image descriptors as input and output the probability of the query text conditioned on visual features of candidate box and whole image. Those candidate boxes are generated by RPN and their local features are extracted by ROI pooling. RPN can be initialized from pre-trained Faster R-CNN model [1], transfers object visual knowledge from traditional object detection domain to our task. Experimental results demonstrate that our method significantly outperform previous baseline SCRC (Spatial Context Recurrent ConvNet) [7] model on Referit dataset [8], moreover, our model is simple to train similar to Faster R-CNN.
机译:在本文中,我们提出了一种简单有效的自然语言对象检测框架,用于基于目标的描述在图像中定位目标。该方法称为语义R-CNN,它通过添加用于处理自然语言查询文本的LSTM [20J]模块来扩展RPN(区域提议网络)[1]。 LSTM [20]模块将编码的查询文本和图像描述符作为输入,并输出以候选框和整个图像的视觉特征为条件的查询文本的概率。这些候选框由RPN生成,其局部特征由ROI池提取。 RPN可以从预先训练的Faster R-CNN模型[1]中初始化,将物体的视觉知识从传统的物体检测领域转移到我们的任务中。实验结果表明,我们的方法显着胜过Referit数据集[8]上的先前基线SCRC(空间上下文递归ConvNet)模型[7],此外,与Faster R-CNN相似,我们的模型易于训练。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号