首页> 外国专利> TEXT MINING FOR AUTOMATICALLY DETERMINING SEMANTIC RELATEDNESS

TEXT MINING FOR AUTOMATICALLY DETERMINING SEMANTIC RELATEDNESS

机译:用于自动确定语义相关性的文本挖掘

摘要

Described herein is an approach for automatically determining the semantic relatedness of documents to semantic concepts. A first text mining analysis extracts a set of reference concepts from reference documents. A second text mining analysis extracts a set of test concepts from test documents that include a mixture of new concepts and reference concepts. An extended co-occurrence matrix is computed that indicates a frequency of co-occurrence (RCCF) of each new and each reference concept in the test documents with all other new and reference concepts. The extended co-occurrence matrix is used for computing a new concept relatedness score (NCRS) for the new concepts. A document similarity score (DSS) is computed for each of the test documents by aggregating, inter alia, the NCRS of each new concept with the RCCF of each reference concept. The DSS represents the semantic relatedness of the test document to the totality of the reference concepts.
机译:本文描述了一种用于自动确定文档与语义概念的语义相关性的方法。最初的文本挖掘分析从参考文档中提取了一组参考概念。第二次文本挖掘分析从测试文档中提取了一组测试概念,其中包括新概念和参考概念的混合。计算扩展的共现矩阵,该矩阵指示测试文档中每个新概念和每个参考概念与所有其他新概念和参考概念的共现频率(RCCF)。扩展的共现矩阵用于计算新概念的新概念相关性分数(NCRS)。通过尤其是将每个新概念的NCRS与每个参考概念的RCCF进行汇总,可以为每个测试文档计算一个文档相似度得分(DSS)。 DSS表示测试文档与参考概念的整体之间的语义相关性。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号