首页> 外国专利> Core keywords extraction system and method in document

Core keywords extraction system and method in document

机译:文档中核心关键词提取系统及方法

摘要

The present invention relates to a system for extracting key keywords in a document, comprising: a candidate word selection unit for selecting candidate words by analyzing keywords in a plurality of atypical text documents; A similar meaning word selection unit for clustering words through word embeddings in the plurality of atypical text documents, and selecting similar meaning words among words by analyzing the meanings of the words selected by the candidate word selection unit; And a keyword extracting unit extracting final keywords by normalizing weights applied to the candidate word and the similar semantic word. Characterized in that it comprises a. The present invention also relates to a method for extracting key keywords in a document, comprising: a first step of selecting candidate words by analyzing keywords in a plurality of atypical text documents; A second step of grouping words through word embedding in the plurality of atypical text documents, and analyzing similar meanings of the selected words through the first step to select similar semantic words between words; And a third step of extracting final keywords by normalizing weights applied to candidate words in the first step and similar semantic words in the second step. Characterized in that consists of. Accordingly, the extraction quality can be improved by performing keyword extraction using normalized weights for the word sets obtained by using different keyword selection algorithms. In addition, the weight is determined by the link relation of words in the document, and the key words in the document are properly embedded by selecting the candidate word set and the related word set having similar meaning in the document and correcting the weight applied to each word set (weight normalization). It is possible to implement extraction of associative words related to key words as well as (with many links).
机译:本发明涉及一种用于在文档中提取关键词的系统,包括:候选词选择单元,用于通过分析多个非典型文本文档中的关键词来选择候选词;以及相似含义词选择单元,用于通过多个非典型文本文档中的词嵌入来聚类词,并通过分析候选词选择单元选择的词的含义来在词中选择相似含义词;关键字提取单元通过归一化应用于候选词和相似语义词的权重来提取最终关键字。其特征在于包括一个。本发明还涉及一种用于在文档中提取关键词的方法,该方法包括:通过分析多个非典型文本文档中的关键词来选择候选单词的第一步;第二步,通过在多个非典型文本文档中嵌入单词对单词进行分组,并通过第一步分析所选单词的相似含义,以在单词之间选择相似的语义单词;第三步,通过标准化第一步中应用于候选单词和第二步中相似语义单词的权重来提取最终关键字。其特征在于。因此,通过使用归一化权重对通过使用不同关键词选择算法获得的词集进行关键词提取,可以提高提取质量。另外,权重由文档中单词的链接关系确定,并且通过选择文档中具有相似含义的候选单词集和相关单词集并校正应用于文档的权重,可以正确地嵌入文档中的关键词。每个单词集(权重归一化)。可以实现与关键词相关的关联词以及(具有许多链接)的关联词的提取。

著录项

  • 公开/公告号KR102019194B1

    专利类型

  • 公开/公告日2019-09-06

    原文格式PDF

  • 申请/专利权人 주식회사 와이즈넛;

    申请/专利号KR20170156375

  • 发明设计人 김문종;장정훈;

    申请日2017-11-22

  • 分类号G06F17/27;G06F16;G06N3/08;

  • 国家 KR

  • 入库时间 2022-08-21 11:47:47

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号