首页> 外国专利> Keyword determining device, determining method, document retrieval apparatus, retrieval method, document classification apparatus and classification method, and program

Keyword determining device, determining method, document retrieval apparatus, retrieval method, document classification apparatus and classification method, and program

机译:关键字确定装置,确定方法,文档检索设备,检索方法,文档分类设备和分类方法以及程序

摘要

PROBLEM TO BE SOLVED: To solve problems wherein an anchor character string is not necessarily a description which explains the contents of a document completely even if considering the anchor character string of the link origin of the document as the object of retrieval/classification, and further narrowing-down retrieval cannot be performed with sufficient accuracy.;SOLUTION: A document cluster information acquiring means 12 extracts link information from the given document, generates a document reference relation table, then determines whether the given document starts from the top page, and registers in a document cluster table according to the determined result. A document keyword determining means 14 refers to the document reference relation table and the document cluster table to set the anchor character string of the link stretched from the outside of a site, as a site outside keyword and to set a series of anchor character string obtained going back to the link of the document in the same cluster as a site inside keyword on the document in each cluster, and stores them respectively in a document keyword storage part 22.;COPYRIGHT: (C)2004,JPO
机译:解决的问题:即使将文档的链接源的锚定字符串作为检索/分类的对象,解决锚定字符串不一定是完全解释文档内容的描述的问题,并且进一步不能以足够的精度执行缩小的检索。解决方案:文档簇信息获取装置12从给定文档中提取链接信息,生成文档参考关系表,然后确定给定文档是否从首页开始,并进行注册。根据确定的结果在文档聚类表中。文档关键词确定装置14参考文档参考关系表和文档聚类表,以将从站点外部延伸的链接的锚字符串设置为站点外部关键字,并设置获得的一系列锚字符串。返回到与每个群集中文档的关键字内部站点相同的群集中文档的链接,并将它们分别存储在文档关键字存储部分22中; COPYRIGHT:(C)2004,JPO

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号