首页>
外国专利>
Core keywords extraction system and method in document
Core keywords extraction system and method in document
展开▼
机译:文档中核心关键词提取系统及方法
展开▼
页面导航
摘要
著录项
相似文献
摘要
The present invention relates to a system for extracting key keywords in a document, comprising: a candidate word selection unit for selecting candidate words by analyzing keywords in a plurality of atypical text documents; A similar meaning word selection unit for clustering words through word embeddings in the plurality of atypical text documents, and selecting similar meaning words among words by analyzing the meanings of the words selected by the candidate word selection unit; And a keyword extracting unit extracting final keywords by normalizing weights applied to the candidate word and the similar semantic word. Characterized in that it comprises a. The present invention also relates to a method for extracting key keywords in a document, comprising: a first step of selecting candidate words by analyzing keywords in a plurality of atypical text documents; A second step of grouping words through word embedding in the plurality of atypical text documents, and analyzing similar meanings of the selected words through the first step to select similar semantic words between words; And a third step of extracting final keywords by normalizing weights applied to candidate words in the first step and similar semantic words in the second step. Characterized in that consists of. Accordingly, the extraction quality can be improved by performing keyword extraction using normalized weights for the word sets obtained by using different keyword selection algorithms. In addition, the weight is determined by the link relation of words in the document, and the key words in the document are properly embedded by selecting the candidate word set and the related word set having similar meaning in the document and correcting the weight applied to each word set (weight normalization). It is possible to implement extraction of associative words related to key words as well as (with many links).
展开▼