首页> 外国专利> COMPARING DOCUMENT CONTENTS USING A CONSTRUCTED TOPIC MODEL

COMPARING DOCUMENT CONTENTS USING A CONSTRUCTED TOPIC MODEL

机译:使用构造的主题模型比较文档内容

摘要

Comparing document contents is provided. An ontological concept is extracted from a text snippet of a corpus document. One or more feature vectors are constructed that include associative information that describes an ontology that includes the focused concept. A topic model is trained using the one or more feature vectors. First and second topic sets are respectively extracted from first and second documents using the topic model. One or more topics from the first topic set are matched, using the topic model, with one or more topics from the second topic set to construct a matched topic set. Semantic analyses are respectively performed on first and second text snippet sets, wherein the first and second text snippet sets are chosen based, at least in part, on the matched topic set. Text snippets are matched based, at least in part, on the first and second semantic analyses.
机译:提供了文档内容的比较。从语料库文档的文本片段中提取本体概念。构建一个或多个特征向量,这些特征向量包括描述包含关注概念的本体的关联信息。使用一个或多个特征向量训练主题模型。使用主题模型分别从第一和第二文档中提取第一和第二主题集。使用主题模型,将第一个主题集中的一个或多个主题与第二个主题集中的一个或多个主题进行匹配,以构建匹配的主题集。分别对第一和第二文本片段集进行语义分析,其中,至少部分地基于匹配的主题集来选择第一和第二文本片段集。文本片段至少部分地基于第一和第二语义分析来匹配。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号