首页> 外国专利> REPRESENTATIVE WORD EXTRACTION DEVICE, REPRESENTATIVE WORD EXTRACTION METHOD, AND REPRESENTATIVE WORD EXTRACTION PROGRAM

REPRESENTATIVE WORD EXTRACTION DEVICE, REPRESENTATIVE WORD EXTRACTION METHOD, AND REPRESENTATIVE WORD EXTRACTION PROGRAM

机译:代表词提取装置,代表词提取方法和代表词提取程序

摘要

PROBLEM TO BE SOLVED: To extract a word that represents a document group without depending upon the number of documents included in the document group.SOLUTION: A preprocessing part 11 collects document groups including a target document group to be a target to extract a representative word, and a reference word acquiring part 13 acquires a reference word to be reference to extract the representative word. A reference document specifying part 14 specifies a reference document including the reference word from the document groups inputted from the preprocessing part 11, and a word group extracting part 15 extracts the reference word and words other than the reference word as a word group from the reference document. An index calculating part 16 calculates an index whose value increases or decreases in accordance with the magnitude of the co-occurrence frequency with the reference word for each word of the extracted word group. Then, an index correcting part 17 calculates the degree of rarity in the whole document groups and the degree of rarity in a target document group for each word of the extracted word group, and corrects the index calculated by the index calculating part 16 by using the calculated two degrees of rarity.
机译:解决的问题:在不依赖于文档组中包括的文档数量的情况下提取表示文档组的单词。解决方案:预处理部分11收集包括目标文档组的文档组作为目标以提取代表单词。参考词获取部分13获取要被参考的参考词以提取代表词。参考文档指定部分14从从预处理部分11输入的文档组中指定包括参考单词的参考文档,单词组提取部分15从参考中提取参考单词和参考单词以外的单词作为单词组。文件。索引计算部16针对所提取的词组的每个词,计算与参考词的共现频率的大小相应地增加或减少的索引。然后,索引校正部分17针对提取的单词组的每个单词计算整个文档组中的稀有度和目标文档组中的稀有度,并且通过使用索引校正部分17来校正由索引计算部分16计算出的索引。计算了两个稀有度。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号