首页> 外国专利> Evaluating commonality of documents using segment vector, co-occurrence matrix, and common co-occurrence matrix

Evaluating commonality of documents using segment vector, co-occurrence matrix, and common co-occurrence matrix

机译:使用段向量,共现矩阵和共同共现矩阵评估文档的共性

摘要

In evaluating commonality of documents, each sentence is represented by a binary vector whose components indicate the presence or absence of corresponding terms, whereupon the concept of a common vector among documents is introduced. One sentence vector is derived from each of the documents to form a group of sentence groups, and only components which assume “1” (one) in all the vectors are “1”, the other components being “0” (zero). The commonality of a document set is evaluated by employing the sum or squared sum of the numbers of components whose values are not zero in the individual common vectors, for all the common vectors.
机译:在评估文档的公共性时,每个句子都由一个二进制向量表示,其成分指示相应术语的存在与否,于是引入了文档之间的公共向量的概念。从每个文档中导出一个句子向量以形成一组句子组,并且在所有向量中仅假设为“ 1”(一个)的成分为“ 1”,其他成分为“ 0”(零)。对于所有公共向量,通过在各个公共向量中采用值不为零的组件数量的总和或平方和来评估文档集的公共性。

著录项

  • 公开/公告号US7392175B2

    专利类型

  • 公开/公告日2008-06-24

    原文格式PDF

  • 申请/专利权人 TAKAHIKO KAWATANI;

    申请/专利号US20030694773

  • 发明设计人 TAKAHIKO KAWATANI;

    申请日2003-10-29

  • 分类号G06F17/27;G06F17/28;G06F17/30;

  • 国家 US

  • 入库时间 2022-08-21 20:10:22

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号