首页> 外国专利> Jointly modeling embedding and translation to bridge video and language

Jointly modeling embedding and translation to bridge video and language

机译：联合建模嵌入和翻译以桥接视频和语言

页面导航

摘要
著录项
相似文献

摘要

Video description generation using neural network training based on relevance and coherence is described. In some examples, long short-term memory with visual-semantic embedding (LSTM-E) can maximize the probability of generating the next word given previous words and visual content and can create a visual-semantic embedding space for enforcing the relationship between the semantics of an entire sentence and visual content. LSTM-E can include a 2-D and/or 3-D deep convolutional neural networks for learning powerful video representation, a deep recurrent neural network for generating sentences, and a joint embedding model for exploring the relationships between visual content and sentence semantics.

机译：描述了基于相关性和连贯性的使用神经网络训练的视频描述生成。在某些示例中，具有视觉语义嵌入（LSTM-E）的长期短期记忆可以在给定先前单词和视觉内容的情况下最大程度地生成下一个单词的可能性，并可以创建视觉语义嵌入空间来加强语义之间的关系整个句子和视觉内容。 LSTM-E可以包括用于学习强大的视频表示的2-D和/或3-D深度卷积神经网络，用于生成句子的深度递归神经网络，以及用于探索视觉内容与句子语义之间关系的联合嵌入模型。

著录项

公开/公告号US9807473B2

专利类型
公开/公告日2017-10-31

原文格式PDF
申请/专利权人 MICROSOFT TECHNOLOGY LICENSING LLC;
展开▼

申请/专利号US201514946988
发明设计人 TAO MEI;TING YAO;YONG RUI;
展开▼

申请日2015-11-20
分类号H04N5/445;H04N21/8405;G06F17/27;G06K9;G06N3/08;
国家 US
入库时间 2022-08-21 13:44:22

相似文献

专利
外文文献
中文文献