首页> 外国专利> Feature language extracting equipment, feature language extraction method and feature language extraction program

Feature language extracting equipment, feature language extraction method and feature language extraction program

机译:特征语言提取设备,特征语言提取方法和特征语言提取程序

摘要

P To efficiently extract suitable feature words corresponding to a specific category. PSOLUTION: A first appearance frequency indicating the number of document data in which word pairs included in a plurality of document data concurrently occur and a second appearance frequency indicating the number of document data in which word pairs concurrently occur out of the plurality of document data to which a specified category is made to correspond are calculated. A value obtained by dividing the first appearance frequency by the second appearance frequency is calculated as a degree of concurrent occurrence. Network data using words as nodes and the degree of concurrent occurrence as an edge is generated as matrix data which are a symmetrical matrix of N× N.A maximum inherent value of the generated matrix data is calculated as a degree of aggregation. A cluster being a set of a plurality of words determined from an inherent vector corresponding to the calculated degree of aggregation is extracted. A degree of the attribution of each word to the cluster is calculated. A plurality of nodes having attribution degrees exceeding a threshold are extracted as feature words expressing a feature of the specified category. PCOPYRIGHT: (C)2011 and JPO& INPIT
机译:

有效地提取与特定类别对应的合适特征词。解决方案:第一出现频率指示多个文档数据中包括多个单词数据中的单词对同时出现的文档数据数量,第二出现频率指示多个文档数据中多个单词对同时出现的文档数据数量。计算与指定类别相对应的文档数据。通过将第一出现频率除以第二出现频率而获得的值被计算为同时发生的程度。生成以单词为节点,并发发生程度为边的网络数据作为矩阵数据,该矩阵数据是N次的对称矩阵。 N.将生成的矩阵数据的最大固有值计算为聚集度。提取聚类,该聚类是根据与计算的聚合度相对应的固有向量确定的多个单词的集合。计算每个单词对聚类的归属程度。提取具有超过阈值的归因度的多个节点作为表达指定类别的特征的特征词。

版权:(C)2011和JPO&INPIT

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号