首页> 外国专利> EXTREMELY SIMILAR DOCUEMTN EXTRACTION METHOD

EXTREMELY SIMILAR DOCUEMTN EXTRACTION METHOD

机译:极其相似的文档提取方法

摘要

PROBLEM TO BE SOLVED: To accurately extract a document extremely similar to a certain document and to extract it with less noise. ;SOLUTION: The document input processing 2 of a new document 1 is performed and a word appearance pattern extraction processing 3 such as the word extraction of a specified speech part, unrequited word elimination and the approval of a word appearance order, etc., is performed by using dictionaries 11 and 12. An extremely similar document decision processing 4 for generating a word information table 13, collating it with a DB information table 14 obtained by executing the processing 3 to all the documents inside a DB, extracting words appearing in common and the string of the words for which the appearance order of the respective words is the same for every document unit, adding a value for which weight is added to the number of the words appearing in common and the value of a monotonous increase function whose variable is the number of the words constituting the string of the words, calculating the degree of extreme similarity for respective sentence units and approving the extremely similar document in the case that the sentence unit provided with the degree of the extreme similarity higher than a certain threshold value continues more than a certain length is performed. The result is displayed 5 and registration judgement 6 is performed.;COPYRIGHT: (C)1997,JPO
机译:解决的问题:准确地提取与某些文档极为相似的文档并以较低的噪音提取它。 ;解决方案:执行新文档1的文档输入处理2并进行单词出现模式提取处理3,例如指定语音部分的单词提取,不返回单词的消除和单词出现顺序的批准等。一个非常相似的文档决策处理程序4,用于通过使用词典11和12来执行。生成单词信息表13,将其与通过对DB内部的所有文档执行处理3而获得的DB信息表14进行核对,以提取共同出现的单词并且,对于每个文档单元,各个单词的出现顺序相同的单词的字符串,在共同出现的单词的数量和其变量的单调增加函数的值上加上权重值是构成单词串的单词数,计算各个句子单元的极端相似度并批准在极端相似度高于特定阈值的句子单元连续超过特定长度的情况下,执行相似文档。结果显示5并执行注册判断6。版权所有:(C)1997,JPO

著录项

  • 公开/公告号JPH09198409A

    专利类型

  • 公开/公告日1997-07-31

    原文格式PDF

  • 申请/专利权人 HITACHI LTD;

    申请/专利号JP19960026185

  • 申请日1996-01-19

  • 分类号G06F17/30;

  • 国家 JP

  • 入库时间 2022-08-22 03:32:46

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号