首页> 外国专利> System and method for dynamically evaluating latent concepts in unstructured documents

System and method for dynamically evaluating latent concepts in unstructured documents

机译:动态评估非结构化文档中潜在概念的系统和方法

摘要

A system and method for dynamically evaluating latent concepts in unstructured documents is disclosed. A multiplicity of concepts are extracted from a set of unstructured documents into a lexicon. The lexicon uniquely identifies each concept and a frequency of occurrence. A frequency of occurrence representation is created for the documents set. The frequency representation provides an ordered corpus of the frequencies of occurrence of each concept. A subset of concepts is selected from the frequency of occurrence representation filtered against a pre-defined threshold. A group of weighted clusters of concepts selected from the concepts subset is generated. A matrix of best fit approximations is determined for each document weighted against each group of weighted clusters of concepts.
机译:公开了一种用于动态评估非结构化文档中的潜在概念的系统和方法。从一组非结构化文档中提取了多个概念到词典中。词典唯一地标识每个概念和出现的频率。为文档集创建出现频率的表示形式。频率表示提供每个概念出现频率的有序语料库。从针对预定阈值过滤的出现频率代表中选择概念的子集。从概念子集中选择的一组加权概念簇被生成。针对每个文档加权概念群集的每个组,确定最佳拟合近似矩阵。

著录项

  • 公开/公告号US2006089947A1

    专利类型

  • 公开/公告日2006-04-27

    原文格式PDF

  • 申请/专利权人 DAN GALLIVAN;KENJI KAWAI;

    申请/专利号US20050304406

  • 发明设计人 DAN GALLIVAN;KENJI KAWAI;

    申请日2005-12-14

  • 分类号G06F17;

  • 国家 US

  • 入库时间 2022-08-21 21:46:15

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号