首页> 外国专利> Method and Apparatus for Computing Similarity Between Cross-Field Documents

Method and Apparatus for Computing Similarity Between Cross-Field Documents

机译:计算跨领域文档之间相似度的方法和装置

摘要

A method includes storing documents of different fields, and a relationship between any two documents of different fields, performing word segmentation and stop word removal on the documents of different fields, to obtain a vocabulary data set for the documents of different fields, constructing an incidence matrix between the documents of different fields according to the relationship between the any two documents of different fields, obtaining a topic cluster of the documents of different fields according to the vocabulary data set, obtaining a probability that any topic in the topic cluster appears in any document and a matching weight of the any topic for any two different fields according to the incidence matrix and the topic cluster, and computing a similarity between the any two documents according to the probabilities and the matching weight of the any topic for the fields to which the any two documents belong.
机译:一种方法,包括存储不同领域的文档,以及不同领域的任意两个文档之间的关系,对不同领域的文档进行分词和停止单词去除,以获得不同领域的文档的词汇数据集,构造关联根据不同领域的任意两个文档之间的关系,建立不同领域的文档之间的矩阵,根据词汇数据集获得不同领域的文档的主题簇,获得该主题簇中任何主题出现在任何根据发生率矩阵和主题簇对任意两个不同字段的文档和任意主题的匹配权重,并根据任意主题的概率和任意主题的匹配权重,计算任意两个文档之间的相似度任何两个文件都属于。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号