COMIC: Toward A Compact Image Captioning Model With Attention

Tan Jia Huei; Chan Chee Seng; Chuah Joon Huang

首页> 外文期刊>IEEE transactions on multimedia >COMIC: Toward A Compact Image Captioning Model With Attention

【24h】

COMIC: Toward A Compact Image Captioning Model With Attention

机译：COMIC：引起注意的紧凑型图像字幕模型

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recent works in image captioning have shown very promising raw performance. However, we realize that most of these encoder-decoder style networks with attention do not scale naturally to large vocabulary size, making them difficult to deploy on embedded systems with limited hardware resources. This is because the size of word and output embedding matrices grow proportionally with the size of vocabulary, adversely affecting the compactness of these networks. To address this limitation, this paper introduces a brand new idea in the domain of image captioning. That is, we tackle the problem of compactness of image captioning models which is hitherto unexplored. We showed that our proposed model, named COMIC for compact image captioning, achieves comparable results in five common evaluation metrics with state-of-the-art approaches on both MS-COCO and InstaPIC-1.1M datasets despite having an embedded vocabulary size that is 39x-99x smaller.

机译：图像字幕的最新工作显示出非常有希望的原始性能。但是，我们认识到，大多数这些受关注的编解码器样式的网络无法自然地扩展到大词汇量，这使得它们难以部署在硬件资源有限的嵌入式系统上。这是因为单词和输出嵌入矩阵的大小与词汇表的大小成比例地增长，从而不利地影响了这些网络的紧凑性。为了解决这个限制，本文在图像字幕领域引入了一个崭新的想法。也就是说，我们解决了迄今为止尚未开发的图像字幕模型的紧凑性问题。我们展示了我们提出的名为COMIC的紧凑图像字幕模型，该模型在MS-COCO和InstaPIC-1.1M数据集上均采用最新技术，在五个常用评估指标中均获得了可比的结果，尽管其嵌入式词汇量为缩小39x-99x。

著录项

来源
《IEEE transactions on multimedia》 |2019年第10期|2686-2696|共11页
作者
Tan Jia Huei; Chan Chee Seng; Chuah Joon Huang;
展开▼
作者单位

Univ Malaya Ctr Image & Signal Proc Dept Artificial Intelligence Fac Comp Sci & Informat Technol Kuala Lumpur 50603 Malaysia;

Univ Malaya Dept Elect Engn Fac Engn Kuala Lumpur 50603 Malaysia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Image captioning; deep compression network; deep learning;

机译：图片字幕;深度压缩网络;深度学习;

相似文献

外文文献
中文文献
专利

1. Modeling visual and word-conditional semantic attention for image captioning [J] . Wu Chunlei, Wei Yiwei, Chu Xiaoliang, Signal Processing. Image Communication: A Publication of the the European Association for Signal Processing . 2018,第期

机译：模拟图像标题的视觉和单词条件语义关注
2. Clothes image caption generation with attribute detection and visual attention model [J] . Li Xianrui, Ye Zhiling, Zhang Zhao, Pattern recognition letters . 2021,第Jana期

机译：衣服图像标题生成，具有属性检测和视觉注意模型
3. GateCap: Gated spatial and semantic attention model for image captioning [J] . Shiwei Wang, Long Lan, Xiang Zhang, Multimedia Tools and Applications . 2020,第17a18期

机译：GATECAP：图像标题的门间空间和语义关注模型
4. Refining Attention: A Sequential Attention Model for Image Captioning [C] . Fang Fang, Qinyu Li, Hanli Wang, IEEE International Conference on Multimedia and Expo . 2018

机译：提炼注意力：图像字幕的顺序注意力模型
5. Arabic Image Captioning Using Deep Learning with Attention [D] . Sabri, Sabri Monaf. 2021

机译：使用深入学习的阿拉伯语图像标题
6. Social Image Captioning: Exploring Visual Attention and User Attention [O] . Leiquan Wang, Xiaoliang Chu, Weishan Zhang, 2018

机译：社交图像字幕：探索视觉注意力和用户注意力
7. COMIC: Toward A Compact Image Captioning Model With Attention [O] . Jia Huei Tan, Chee Seng Chan, Joon Huang Chuah 2019

机译：漫画：朝着紧凑的图像标题模型，注意

COMIC: Toward A Compact Image Captioning Model With Attention

摘要

著录项

相似文献

相关主题

期刊订阅