首页> 外国专利> SIMILAR DOCUMENT SET EXTRACTION DEVICE, SIMILAR DOCUMENT SET EXTRACTION METHOD, SIMILAR DOCUMENT SET EXTRACTION PROGRAM AND STORAGE MEDIUM

SIMILAR DOCUMENT SET EXTRACTION DEVICE, SIMILAR DOCUMENT SET EXTRACTION METHOD, SIMILAR DOCUMENT SET EXTRACTION PROGRAM AND STORAGE MEDIUM

机译:类似文件集提取设备,类似文件集提取方法,类似文件集提取程序和存储介质

摘要

PROBLEM TO BE SOLVED: To improve the similarity accuracy of an extracted similar document set.;SOLUTION: The similar document set extraction device 1 for extracting a similar document set from documents accumulated in a document database 100 comprises an input means 10 inputting a numeric value showing the number of similar document sets to be extracted; a document set extraction processing part 52 executing extraction operation of similar document set by the frequency of the input numeric value based on an evaluation function; and an output means 20 outputting the extracted similar document sets. The evaluation function is obtained by summing up the difference between the similarity of a word vector characterizing a document to a representative vector of a document set containing this document and the similarity of the word vector characterizing the document to a representative vector of an extracted document set containing this document over each document contained in the document set. A similar document set extraction processing part 53 determines a value of evaluation function for each document set, and extracts a document set in which the value of evaluation function is maximum.;COPYRIGHT: (C)2007,JPO&INPIT
机译:解决的问题:为了提高所提取的相似文档集的相似性精度;解决方案:用于从文档数据库100中累积的文档中抽取相似文档集的相似文档集提取设备1包括输入数值的输入装置10。显示要提取的相似文档集的数量;文档集合提取处理部分52基于评估函数,通过输入数值的频率执行相似文档集合的提取操作;输出装置20输出所提取的相似文档集。通过将表征文档的单词向量与包含该文档的文档集的代表向量的相似度与表征文档的单词向量与提取的文档集的代表向量的相似度之间的差异进行求和,可获得评估函数在文档集中包含的每个文档上都包含此文档。相似文档集提取处理部分53为每个文档集确定评估函数的值,并提取其中评估函数的值最大的文档集。版权所有:(C)2007,JPO&INPIT

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号