Deep Visual Semantic Embedding with Text Data Augmentation and Word Embedding Initialization

Hai He; Haibo Yang

首页> 外文期刊>Mathematical Problems in Engineering: Theory, Methods and Applications >Deep Visual Semantic Embedding with Text Data Augmentation and Word Embedding Initialization

【24h】

Deep Visual Semantic Embedding with Text Data Augmentation and Word Embedding Initialization

机译：深视觉语义嵌入文本数据增强和单词嵌入初始化

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Language and vision are the two most essential parts of human intelligence for interpreting the real world around us. How to make connections between language and vision is the key point in current research. Multimodality methods like visual semantic embedding have been widely studied recently, which unify images and corresponding texts into the same feature space. Inspired by the recent development of text data augmentation and a simple but powerful technique proposed called EDA (easy data augmentation), we can expand the information with given data using EDA to improve the performance of models. In this paper, we take advantage of the text data augmentation technique and word embedding initialization for multimodality retrieval. We utilize EDA for text data augmentation, word embedding initialization for text encoder based on recurrent neural networks, and minimizing the gap between the two spaces by triplet ranking loss with hard negative mining. On two Flickr-based datasets, we achieve the same recall with only 60% of the training dataset as the normal training with full available data. Experiment results show the improvement of our proposed model; and, on all datasets in this paper (Flickr8k, Flickr30k, and MS-COCO), our model performs better on image annotation and image retrieval tasks; the experiments also demonstrate that text data augmentation is more suitable for smaller datasets, while word embedding initialization is suitable for larger ones.

机译：语言和愿景是人类智慧的两个最重要的部分，用于解释我们周围的现实世界。如何在语言和愿景之间进行连接是当前研究的关键点。最近已经广泛研究了视觉语义嵌入等多模态方法，将图像和对应文本统一到相同的特征空间中。灵感来自最近的文本数据增强和一个简单但强大的技术，称为EDA（简单的数据增强），我们可以使用EDA扩展信息，以提高模型的性能。在本文中，我们利用了文本数据增强技术和嵌入初始化的文本数据增强技术，以进行多模检索。我们利用EDA进行文本数据增强，基于经常性神经网络的文本编码器的单词嵌入初始化，并通过三联排名损耗来最小化两个空间之间的间隙。在两个基于Flickr的数据集中，我们可以获得与具有完整可用数据的正常培训的培训数据集相同的召回。实验结果表明我们所提出的模型的改进;并且，在本文的所有数据集（FlickR8K，FlickR30K和MS-Coco）上，我们的模型在图像注释和图像检索任务上执行更好;实验还证明了文本数据增强更适合于较小的数据集，而嵌入初始化的单词适用于较大的数据集。

著录项

来源
《Mathematical Problems in Engineering: Theory, Methods and Applications》 |2021年第a期|共8页
作者
Hai He; Haibo Yang;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类工程数学;
关键词

相似文献

外文文献
中文文献
专利

1. Deep text classification of Instagram data using word embeddings and weak supervision [J] . Hammar Kim, Jaradat Shatha, Dokoohaki Nima, Web Intelligence . 2020,第1期

机译：使用Word Embeddings和弱监管的Instagram数据的深文本分类
2. A Hamming Embedding Kernel with Informative Bag-of-Visual Words for Video Semantic Indexing [J] . Wang Feng, Zhao Wan-Lei, Ngo Chong-Wah, ACM transactions on multimedia computing communications and applications . 2014,第3期

机译：用于信息语义索引的带有可视化视觉单词的汉明嵌入内核
3. Text classification with semantically enriched word embeddings [J] . N. Pittaras, G. Giannakopoulos, G. Papadakis, Natural language engineering . 2021,第Pta4期

机译：用语义丰富的单词嵌入文本分类
4. Semantic Visualization for Short Texts with Word Embeddings [C] . Tuan M. V. Le, Hady W. Lauw International Joint Conference on Artificial Intelligence . 2019

机译：单词嵌入式短文本的语义可视化
5. Embracing Visual Experience and Data Knowledge: Efficient Embedded Memory Design for Big Videos and Deep Learning [D] . ?Edstrom, Jonathon David 2019

机译：拥抱视觉体验和数据知识：高视频和深度学习的高效嵌入式内存设计
6. Leveraging word embeddings and medical entity extraction for biomedical dataset retrieval using unstructured texts [O] . Yanshan Wang, Majid Rastegar-Mojarad, Ravikumar Komandur-Elayavilli, 2017

机译：利用单词嵌入和医学实体提取来使用非结构化文本检索生物医学数据集
7. Semantic Visualization for Short Texts with Word Embeddings [O] . Tuan M. V. Le, Hady W. Lauw 2017

机译：单词嵌入式短文本的语义可视化

Deep Visual Semantic Embedding with Text Data Augmentation and Word Embedding Initialization

摘要

著录项

相似文献

相关主题

期刊订阅