首页> 外国专利> Learning multimedia semantics from large-scale unstructured data

Learning multimedia semantics from large-scale unstructured data

机译：从大规模非结构化数据中学习多媒体语义

页面导航

摘要
著录项
相似文献

摘要

Systems and methods for learning topic models from unstructured data and applying the learned topic models to recognize semantics for new data items are described herein. In at least one embodiment, a corpus of multimedia data items associated with a set of labels may be processed to generate a refined corpus of multimedia data items associated with the set of labels. Such processing may include arranging the multimedia data items in clusters based on similarities of extracted multimedia features and generating intra-cluster and inter-cluster features. The intra-cluster and the inter-cluster features may be used for removing multimedia data items from the corpus to generate the refined corpus. The refined corpus may be used for training topic models for identifying labels. The resulting models may be stored and subsequently used for identifying semantics of a multimedia data item input by a user.

机译：本文描述了用于从非结构化数据中学习主题模型并应用学习到的主题模型以识别新数据项的语义的系统和方法。在至少一个实施例中，可以处理与一组标签相关联的多媒体数据项的语料库，以生成与该组标签相关联的精炼的多媒体数据项的语料库。这样的处理可以包括：基于所提取的多媒体特征的相似性，将多媒体数据项布置在集群中;以及生成集群内和集群间特征。集群内和集群间功能部件可用于从语料库中删除多媒体数据项，以生成精炼语料库。提炼的语料库可用于训练主题模型以识别标签。所产生的模型可以被存储，并且随后被用于识别用户输入的多媒体数据项的语义。

著录项

公开/公告号US9875301B2

专利类型
公开/公告日2018-01-23

原文格式PDF
申请/专利权人 MICROSOFT TECHNOLOGY LICENSING LLC;
展开▼

申请/专利号US201414266228
发明设计人 XIAN-SHENG HUA;JIN LI;YOSHITAKA USHIKU;
展开▼

申请日2014-04-30
分类号G06N99;G06F17/30;
国家 US
入库时间 2022-08-21 12:56:10

相似文献

专利
外文文献
中文文献