首页> 外国专利> COMPARING DOCUMENT CONTENTS USING A CONSTRUCTED TOPIC MODEL

COMPARING DOCUMENT CONTENTS USING A CONSTRUCTED TOPIC MODEL

机译：使用构造的主题模型比较文档内容

页面导航

摘要
著录项
相似文献

摘要

Comparing document contents is provided. An ontological concept is extracted from a text snippet of a corpus document. One or more feature vectors are constructed that include associative information that describes an ontology that includes the focused concept. A topic model is trained using the one or more feature vectors. First and second topic sets are respectively extracted from first and second documents using the topic model. One or more topics from the first topic set are matched, using the topic model, with one or more topics from the second topic set to construct a matched topic set. Semantic analyses are respectively performed on first and second text snippet sets, wherein the first and second text snippet sets are chosen based, at least in part, on the matched topic set. Text snippets are matched based, at least in part, on the first and second semantic analyses.

机译：提供了文档内容的比较。从语料库文档的文本片段中提取本体概念。构建一个或多个特征向量，这些特征向量包括描述包含关注概念的本体的关联信息。使用一个或多个特征向量训练主题模型。使用主题模型分别从第一和第二文档中提取第一和第二主题集。使用主题模型，将第一个主题集中的一个或多个主题与第二个主题集中的一个或多个主题进行匹配，以构建匹配的主题集。分别对第一和第二文本片段集进行语义分析，其中，至少部分地基于匹配的主题集来选择第一和第二文本片段集。文本片段至少部分地基于第一和第二语义分析来匹配。

著录项

公开/公告号US2015310096A1

专利类型
公开/公告日2015-10-29

原文格式PDF
申请/专利权人 INTERNATIONAL BUSINESS MACHINES CORPORATION;
展开▼

申请/专利号US201514695688
发明设计人 WEI HONG QIAN;ZHI LI GUO;DAVIDE PASETTO;HONG LEI GUO;SHENGHUA BAO;ZHONG SU;
展开▼

申请日2015-04-24
分类号G06F17/30;
国家 US
入库时间 2022-08-21 15:26:08

相似文献

专利
外文文献
中文文献