Adaptive Attention Generation for Indonesian Image Captioning

机译：印尼语字幕的自适应注意生成

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Image captioning is one of the most widely discussed topic nowadays. However, most research in this area generate English caption while there are thousands of language exist around the world. With their language uniqueness, there’s a need of specific research to generate captions in those languages. Indonesia, as the largest Southeast Asian country, has its own language, which is Bahasa Indonesia. Bahasa Indonesia has been taught in various countries such as Vietnam, Australia, and Japan. In this research, we propose the attention-based image captioning model using ResNet101 as the encoder and LSTM with adaptive attention as the decoder for the Indonesian image captioning task. Adaptive attention used to decide when and at which region of the image should be attended to produce the next word. The model we used was trained with the MSCOCO and Flick30k datasets besides. Both datasets are translated manually into Bahasa by human and by using Google Translate. Our research resulted in 0.678, 0.512, 0.375, 0.274, and 0.990 for BLEU-1, BLEU-2, BLEU-3, BLEU-4, and CIDEr scores respectively. Our model also produces a similar score for the English image captioning model, which means our model capable of being equivalent to English image captioning. We also propose a new metric score by conducting a survey. The results state that 76.8% of our model’s caption results are better than validation data that has been translated using Google Translate.

机译：图像字幕是当今讨论最广泛的话题之一。但是，该领域的大多数研究都产生了英语字幕，而世界上却有成千上万种语言存在。由于它们具有独特的语言，因此需要进行专门的研究来生成这些语言的字幕。印度尼西亚是东南亚最大的国家，拥有自己的语言，即印度尼西亚语。印尼语（Bahasa Indonesia）已在越南，澳大利亚和日本等多个国家/地区授课。在这项研究中，我们提出了一种基于注意力的图像字幕模型，该模型使用ResNet101作为编码器，使用LSTM具有自适应注意力的解码器作为印度尼西亚图像字幕任务的解码器。自适应注意力用于决定何时以及在哪个图像区域应注意产生下一个单词。此外，我们使用的模型还通过MSCOCO和Flick30k数据集进行了训练。人工和使用Google Translate将这两个数据集手动翻译成Bahasa。我们的研究分别得出BLEU-1，BLEU-2，BLEU-3，BLEU-4和CIDEr得分分别为0.678、0.512、0.375、0.274和0.990。我们的模型也为英语图像字幕模型产生了类似的分数，这意味着我们的模型能够等同于英语图像字幕。我们还通过进行调查来提出新的指标得分。结果表明，我们的模型字幕结果的76.8％优于使用Google Translate翻译的验证数据。

著录项

来源
《International Conference on Information and Communication Technology》|2020年|1-6|共6页
会议地点
作者
Made Raharja Surya Mahadi; Anditya Arifianto; Kurniawan Nur Ramadhani;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Decoding; Measurement; Feature extraction; Logic gates; Visualization; Adaptation models; Task analysis;

机译：解码;测量;特征提取;逻辑门;可视化;自适应模型;任务分析;

相似文献

外文文献
中文文献
专利

1. Adaptive Attention-based High-level Semantic Introduction for Image Caption [J] . Liu Xiaoxiao, Xu Qingyang ACM transactions on multimedia computing communications and applications . 2020,第4期

机译：基于自适应的图像标题的高级语义介绍
2. Image captioning using DenseNet network and adaptive attention [J] . Signal Processing. Image Communication: A Publication of the the European Association for Signal Processing . 2020,第期

机译：使用DenSenet网络和自适应关注图像标题
3. DAA: Dual LSTMs with adaptive attention for image captioning [J] . Xiao Fen, Gong Xue, Zhang Yiming, Neurocomputing . 2019,第Octa28期

机译：DAA：具有自适应注意力的双重LSTM用于图像字幕
4. Image caption generation method based on adaptive attention mechanism [C] . Huazhong Jin, Yu Wu, Fang Wan, International Symposium on Multispectral Image Processing and Pattern Recognition . 2020

机译：基于自适应注意机制的图像字幕生成方法
5. Generation of Humorous Caption for Cartoon Images Using Deep Learning [D] . Shanmuga Sundaram, Rajesh. 2018

机译：使用深度学习的卡通形象的幽默标题
6. Social Image Captioning: Exploring Visual Attention and User Attention [O] . Leiquan Wang, Xiaoliang Chu, Weishan Zhang, 2018

机译：社交图像字幕：探索视觉注意力和用户注意力
7. Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning [O] . Lu, Jiasen, Xiong, Caiming, Parikh, Devi, 2017

机译：知道何时看：通过Visual sentinel for Image的自适应注意字幕

Adaptive Attention Generation for Indonesian Image Captioning

摘要

著录项

相似文献

相关主题

期刊订阅