首页> 外国专利> Video retrieval system using adaptive spatiotemporal convolution feature representation with dynamic abstraction for video to language translation

Video retrieval system using adaptive spatiotemporal convolution feature representation with dynamic abstraction for video to language translation

机译：使用自适应时空卷积特征表示和动态抽象的视频检索系统，将视频翻译成语言

页面导航

摘要
著录项
相似文献

摘要

A video retrieval system is provided, that includes a set of servers, configured to retrieve a video sequence from a database and forward it to a requesting device responsive to a match between an input text and a caption for the video sequence. The servers are further configured to translate the video sequence into the caption by (A) applying a C3D to image frames of the video sequence to obtain therefor (i) intermediate feature representations across L convolutional layers and (ii) top-layer features, (B) producing a first word of the caption for the video sequence by applying the top-layer features to a LSTM, and (C) producing subsequent words of the caption by (i) dynamically performing spatiotemporal attention and layer attention using the representations to form a context vector, and (ii) applying the LSTM to the context vector, a previous word of the caption, and a hidden state of the LSTM.

机译：提供了一种视频检索系统，其包括一组服务器，其被配置为响应于输入文本和视频序列的字幕之间的匹配，从数据库中检索视频序列并将其转发到请求设备。服务器还被配置为通过（A）将C3D应用于视频序列的图像帧以将视频序列转换为字幕，从而获得（i）跨L个卷积层的中间特征表示和（ii）顶层特征， B）通过将顶层特征应用于LSTM来生成视频序列的字幕的第一个单词，以及（C）通过（i）使用表示形式动态执行时空关注和层关注来生成字幕的后续单词上下文向量;以及（ii）将LSTM应用于上下文向量，标题的前一个单词以及LSTM的隐藏状态。

著录项

公开/公告号US10402658B2

专利类型
公开/公告日2019-09-03

原文格式PDF
申请/专利权人 NEC LABORATORIES AMERICA INC.;
展开▼

申请/专利号US201715794802
发明设计人 RENQIANG MIN;YUNCHEN PU;
展开▼

申请日2017-10-26
分类号G06K9;G06K9/46;G06N3/04;G06K9/66;H04N5/278;G06K9/62;H04N21/218;H04N21/234;H04N21/488;G06K9/72;H04N7/18;
国家 US
入库时间 2022-08-21 12:13:29

相似文献

专利
外文文献
中文文献