首页> 外国专利> KEYWORD EXTRACTION DEVICE, EXTRACTION METHOD, DOCUMENT RETRIEVAL SYSTEM, RETRIEVAL METHOD, DEVICE AND METHOD FOR CLASSIFYING DOCUMENT, AND PROGRAM

KEYWORD EXTRACTION DEVICE, EXTRACTION METHOD, DOCUMENT RETRIEVAL SYSTEM, RETRIEVAL METHOD, DEVICE AND METHOD FOR CLASSIFYING DOCUMENT, AND PROGRAM

机译:关键字提取设备,提取方法,文档检索系统,检索方法,用于分类文档的设备和方法以及程序

摘要

PROBLEM TO BE SOLVED: To solve problems wherein an anchor character string is not necessarily a description which explains the contents of a document completely even if considering the anchor character string of the link origin of the document as the object of retrieval/classification, and further narrowing-down retrieval cannot be performed with sufficient accuracy.;SOLUTION: A document cluster information acquiring means 12 extracts link information from the given document, generates a document reference relation table, then determines whether the given document starts from the top page, and registers in a document cluster table according to the determined result. A document keyword determining means 14 refers to the document reference relation table and the document cluster table to set the anchor character string of the link stretched from the outside of a site, as a site outside keyword and to set a series of anchor character string obtained going back to the link of the document in the same cluster as a site inside keyword on the document in each cluster, and stores them respectively in a document keyword storage part 22.;COPYRIGHT: (C)2004,JPO
机译:解决的问题:即使将文档的链接源的锚定字符串作为检索/分类的对象,解决锚定字符串不一定是完全解释文档内容的描述的问题,并且进一步不能以足够的精度执行缩小的检索。解决方案:文档簇信息获取装置12从给定文档中提取链接信息,生成文档参考关系表,然后确定给定文档是否从首页开始,并进行注册。根据确定的结果在文档聚类表中。文档关键词确定装置14参考文档参考关系表和文档聚类表,以将从站点外部延伸的链接的锚字符串设置为站点外部关键字,并设置获得的一系列锚字符串。返回到与每个群集中文档的关键字内部站点相同的群集中文档的链接,并将它们分别存储在文档关键字存储部分22中; COPYRIGHT:(C)2004,JPO

著录项

  • 公开/公告号JP2004078446A

    专利类型

  • 公开/公告日2004-03-11

    原文格式PDF

  • 申请/专利权人 NEC CORP;

    申请/专利号JP20020236195

  • 申请日2002-08-14

  • 分类号G06F17/30;

  • 国家 JP

  • 入库时间 2022-08-21 23:35:38

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号