首页> 外国专利> METHOD AND SERVER FOR EXTRACTING TOPIC AND EVALUATING COMPATIBILITY OF THE EXTRACTED TOPIC

METHOD AND SERVER FOR EXTRACTING TOPIC AND EVALUATING COMPATIBILITY OF THE EXTRACTED TOPIC

机译:提取主题和评估提取主题的兼容性的方法和服务器

摘要

Provided are a method and a server for extracting a topic and evaluating suitability of the extracted topic. The server for extracting a topic of the present invention comprises: a text pre-treating unit for extracting nouns from a classified document set formed according to classification information, and removing stop words; a keyword extracting unit for extracting keywords which are words representing the classified document set by calculating a weighted value for words with the stop words removed therefrom; a seed selecting unit for selecting seeds which are major words of a cluster obtained by clustering the keywords with related words by calculating a weighted value for the extracted keywords; an initial clustering unit for constituting keywords expressed much in the same sentence as the seeds as one cluster based on the selected seeds; and a cluster combining unit for extracting a classified topic set by combining similar clusters among the constituted clusters.
机译:提供一种用于提取主题并评估所提取的主题的适合性的方法和服务器。本发明的主题提取服务器包括:文本预处理单元,用于从根据分类信息形成的分类文档集中提取名词,并去除停用词。关键字提取单元,用于通过计算去除了停用词的单词的加权值,来提取表示分类文档集的单词的关键字。种子选择单元,用于通过计算所提取的关键词的加权值来选择种子,所述种子是通过将关键词与相关词聚类而获得的聚类的主要词。初始聚类单元,用于基于所选择的种子,构成与所述种子在同一句子中表达得非常多的关键词,作为一个聚类;聚类合并单元,用于通过在构成的聚类中合并相似的聚类来提取分类主题集。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号