首页> 外国专利> Topic word acquisition apparatus, method, and program

Topic word acquisition apparatus, method, and program

机译:主题词获取装置,方法和程序

摘要

PROBLEM TO BE SOLVED: To acquire a topic word which places importance on whether or not the topic word is relevant to specific information represented by at least one of date and place.SOLUTION: A document acquisition unit 12 searches and acquires a document relating to an input keyword and date from a document index 20, and a topic word candidate extraction unit 14 divides the document as the search result into words/characters, generates divided components starting with respective words/characters and ending with the last of the document, rearranging the generated divided components in the order of words/characters, and extracts a topic word candidate on the basis of the number of matching words/characters from heads between adjacent divided components of the rearranged divided components. A date-relevant topic word acquisition unit 16 searches in the document index 20 to obtain the number of documents including both the topic word candidate and the date, the number of documents including only the topic word candidate, the number of documents including only the date, and the number of documents including neither the topic word candidate nor the date and, if a chi-square value calculated by using these numbers is equal to or larger than a threshold, acquires the topic word candidate as a topic word having high relevance to the date.
机译:解决的问题:获取对主题词是否与日期和地点中的至少一个表示的特定信息相关的主题词。解决方案:文档获取单元12搜索并获取与主题词相关的文档。输入来自文档索引20的关键字和日期,主题词候选者提取单元14将作为搜索结果的文档划分为单词/字符,生成从各个单词/字符开始并以文档的最后一个结束的划分成分,生成的单词/字符顺序的划分成分,并基于匹配的单词/字符的数量,从重新排列的划分成分的相邻划分成分之间的头部提取主题词候选。与日期相关的主题词获取单元16在文档索引20中搜索以获得包括主题词候选者和日期两者的文档数量,仅包括主题词候选者的文档数量,仅包括日期的文档数量。 ,并且不包括主题词候选词和日期的文档数目,并且,如果使用这些数字计算的卡方值等于或大于阈值,则获取主题词候选词作为与主题词相关性高的主题词日期。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号