首页>
外国专利>
SEMANTIC MULTISENSORY EMBEDDINGS FOR VIDEO SEARCH BY TEXT
SEMANTIC MULTISENSORY EMBEDDINGS FOR VIDEO SEARCH BY TEXT
展开▼
机译:文本搜索的语义多传感器嵌入
展开▼
页面导航
摘要
著录项
相似文献
摘要
A method of embedding video for text search includes extracting visual features from a video. The visual features may, for example, include appearance information, motion, audio, and/or like features. Term vectors are determined from textual descriptions associated with the video. The text may be included in a title for the video or included within the video (e.g., subtitles), for example. A feature projection is computed based on the extracted video features and a textual projection is computed based on the term vectors. A semantic embedding is computed based on the feature projection and the textual projection by jointly optimizing semantic predictability and semantic descriptiveness.
展开▼