首页> 外文会议>International joint conference on natural language processing >Draw and Tell: Multimodal Descriptions Outperform Verbal- or Sketch-Only Descriptions in an Image Retrieval Task
【24h】

Draw and Tell: Multimodal Descriptions Outperform Verbal- or Sketch-Only Descriptions in an Image Retrieval Task

机译:绘制并告诉:多模式描述越优于图像检索任务中的言语或素描仅素描

获取原文

摘要

While language conveys meaning largely symbolically, actual communication acts typically contain iconic elements as well: People gesture while they speak, or may even draw sketches while explaining something. Image retrieval prima facie seems like a task that could profit from combined symbolic and iconic reference, but it is typically set up to work either from language only, or via (iconic) sketches with no verbal contribution. Using a model of grounded language semantics and a model of sketch-to-image mapping, we show that adding even very reduced iconic information to a verbal image description improves recall. Verbal descriptions paired with fully detailed sketches still perform better than these sketches alone. We see these results as supporting the assumption that natural user interfaces should respond to multimodal input, where possible, rather than just language alone.
机译:虽然语言在很大程度上传达了意义,但实际的通信行为通常也包含标志性元素:人们在说话时甚至可以在解释某些东西时绘制草图。图像检索prima面部似乎是一个可以从组合符号和标志性的参考中获利的任务,但通常设置为仅从语言或通过(标志性)草图的工作,没有口头贡献。使用基础语言语言语言语言的模型和素描到图像映射模型,我们表明将甚至非常减少的标志性信息添加到口头图像描述中提高了召回。用完全详细的草图配对的口头描述仍然比单独的草图更好。我们将这些结果视为支持自然用户界面应该在可能的情况下响应多模式输入的假设,而不是仅仅是语言。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号