首页> 外国专利> CITATION DETECTION DEVICE, DEVICE AND METHOD FOR CREATING ORIGINAL DOCUMENT DATABASE, PROGRAM AND RECORDING MEDIUM

CITATION DETECTION DEVICE, DEVICE AND METHOD FOR CREATING ORIGINAL DOCUMENT DATABASE, PROGRAM AND RECORDING MEDIUM

机译:引用检测设备,用于创建原始文档数据库,程序和记录介质的设备和方法

摘要

PPROBLEM TO BE SOLVED: To accurately detect whether an input document includes a citation composed of two or more continuous sentences without altering a character string in the other document, with less computational complexity. PSOLUTION: An original document DB 4 is prepared by dividing each document in original documents that are candidates of citation source into partial character strings that can be units of citation, creating summaries of the partial character strings, arranging each summary in the order of appearance of the partial character strings to form a digest of the document, and registering, for each partial character string, the digest with a document ID thereof so as to be capable of longest prefix match. A digest creation means 5 converts an input document to a digest similar to the above, and a citation detection means 6 retrieves the original document DB 4 using the digest of the input document as a key by longest prefix match, and outputs, if there is a document in which the number of summaries continuously matching is a predetermined threshold or more, its document ID. PCOPYRIGHT: (C)2010,JPO&INPIT
机译:

要解决的问题:以较低的计算复杂度,无需更改其他文档中的字符串即可准确地检测输入文档是否包含由两个或多个连续句子组成的引用。

解决方案:原始文档DB 4是通过将原始文档(作为引文来源的候选项)中的每个文档划分为可以作为引文单位的部分字符串,创建部分字符串的摘要,并按顺序排列每个摘要而准备的部分字符串的出现以形成文档的摘要,并为每个部分字符串注册具有其文档ID的摘要,以便能够进行最长的前缀匹配。摘要创建装置5将输入文档转换为与上述类似的摘要,并且引用检测装置6使用输入文档的摘要作为关键字通过最长前缀匹配来检索原始文档DB 4,并且如果存在则进行输出。连续匹配的摘要数为预定阈值以上的文档ID。

版权:(C)2010,日本特许厅&INPIT

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号