首页> 外国专利> SYSTEM AND METHOD FOR AUTOMATICALLY DISCOVERING A HIERARCHY OF CONCEPTS FROM A CORPUS OF DOCUMENTS

SYSTEM AND METHOD FOR AUTOMATICALLY DISCOVERING A HIERARCHY OF CONCEPTS FROM A CORPUS OF DOCUMENTS

机译:自动从文档语料库中发现概念层次的系统和方法

摘要

The invention is a method, system and computer program for automatically discovering concepts from a corpus of documents and automatically generating a labeled concept hierarchy. The method involves extraction of signatures from the corpus of documents. The similarity between signatures is computed using a statistical measure. The frequency distribution of signatures is refined to alleviate any inaccuracy in the similarity measure. The signatures are also disambiguated to address the polysemy problem. The similarity measure is recomputed based on the refined frequency distribution and disambiguated signatures. The recomputed similarity measure reflects actual similarity between signatures. The recomputed similarity measure is then used for clustering related signatures. The signatures are clustered to generate concepts and concepts are arranged in a concept hierarchy. The concept hierarchy automatically generates query for a particular concept and retrieves relevant documents associated with the concept.
机译:本发明是一种方法,系统和计算机程序,用于从文档语料库中自动发现概念并自动生成标记的概念层次。该方法涉及从文档语料库中提取签名。使用统计度量计算签名之间的相似性。完善签名的频率分布以减轻相似性度量中的任何不准确性。签名也可以消除歧义问题。基于精确的频率分布和歧义签名重新计算相似性度量。重新计算的相似性度量反映出签名之间的实际相似性。然后将重新计算的相似性度量用于聚类相关签名。将签名聚类以生成概念,然后将概念按概念层次结构排列。概念层次结构自动生成针对特定概念的查询,并检索与该概念关联的相关文档。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号