首页> 外文会议>International Conference on Information and Communication Technology >Adaptive Attention Generation for Indonesian Image Captioning
【24h】

Adaptive Attention Generation for Indonesian Image Captioning

机译:印尼语字幕的自适应注意生成

获取原文

摘要

Image captioning is one of the most widely discussed topic nowadays. However, most research in this area generate English caption while there are thousands of language exist around the world. With their language uniqueness, there’s a need of specific research to generate captions in those languages. Indonesia, as the largest Southeast Asian country, has its own language, which is Bahasa Indonesia. Bahasa Indonesia has been taught in various countries such as Vietnam, Australia, and Japan. In this research, we propose the attention-based image captioning model using ResNet101 as the encoder and LSTM with adaptive attention as the decoder for the Indonesian image captioning task. Adaptive attention used to decide when and at which region of the image should be attended to produce the next word. The model we used was trained with the MSCOCO and Flick30k datasets besides. Both datasets are translated manually into Bahasa by human and by using Google Translate. Our research resulted in 0.678, 0.512, 0.375, 0.274, and 0.990 for BLEU-1, BLEU-2, BLEU-3, BLEU-4, and CIDEr scores respectively. Our model also produces a similar score for the English image captioning model, which means our model capable of being equivalent to English image captioning. We also propose a new metric score by conducting a survey. The results state that 76.8% of our model’s caption results are better than validation data that has been translated using Google Translate.
机译:图像字幕是当今讨论最广泛的话题之一。但是,该领域的大多数研究都产生了英语字幕,而世界上却有成千上万种语言存在。由于它们具有独特的语言,因此需要进行专门的研究来生成这些语言的字幕。印度尼西亚是东南亚最大的国家,拥有自己的语言,即印度尼西亚语。印尼语(Bahasa Indonesia)已在越南,澳大利亚和日本等多个国家/地区授课。在这项研究中,我们提出了一种基于注意力的图像字幕模型,该模型使用ResNet101作为编码器,使用LSTM具有自适应注意力的解码器作为印度尼西亚图像字幕任务的解码器。自适应注意力用于决定何时以及在哪个图像区域应注意产生下一个单词。此外,我们使用的模型还通过MSCOCO和Flick30k数据集进行了训练。人工和使用Google Translate将这两个数据集手动翻译成Bahasa。我们的研究分别得出BLEU-1,BLEU-2,BLEU-3,BLEU-4和CIDEr得分分别为0.678、0.512、0.375、0.274和0.990。我们的模型也为英语图像字幕模型产生了类似的分数,这意味着我们的模型能够等同于英语图像字幕。我们还通过进行调查来提出新的指标得分。结果表明,我们的模型字幕结果的76.8%优于使用Google Translate翻译的验证数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号