首页> 外国专利> Category based, extensible and interactive system for document retrieval

Category based, extensible and interactive system for document retrieval

机译:基于类别的可扩展交互式文档检索系统

摘要

In information retrieval (IR) systems with high-speed access, especially to search engines applied to the Internet and/or corporate intranet domains for retrieving accessible documents automatic text categorization techniques are used to support the presentation of search query results within high-speed network environments. ;An integrated, automatic and open information retrieval system (100) comprises an hybrid method based on linguistic and mathematical approaches for an automatic text categorization. It solves the problems of conventional systems by combining an automatic content recognition technique with a self-learning hierarchical scheme of indexed categories. In response to a word submitted by a requester, said system (100) retrieves documents containing that word, analyzes the documents to determine their word-pair patterns, matches the document patterns to database patterns that are related to topics, and thereby assigns topics to each document. If the retrieved documents are assigned to more than one topic, a list of the document topics is presented to the requester, and the requester designates the relevant topics. The requester is then granted access only to documents assigned to relevant topics. A knowledge database (1408) linking search terms to documents and documents to topics is established and maintained to speed future searches. Additionally, new strategies are presented to deal with different update frequencies of changed Web sites.
机译:在具有高速访问权限的信息检索(IR)系统中,尤其是对于应用于Internet和/或公司Intranet域以检索可访问文档的搜索引擎而言,自动文本分类技术用于支持高速网络中搜索查询结果的表示环境。 ;集成的,自动的和开放的信息检索系统( 100 )包括一种基于语言和数学方法的混合方法,用于自动文本分类。通过将自动内容识别技术与索引类别的自学习分层方案相结合,解决了传统系统的问题。响应请求者提交的单词,所述系统( 100 )检索包含该单词的文档,分析文档以确定其单词对模式,将文档模式与与以下内容相关的数据库模式进行匹配主题,从而为每个文档分配主题。如果将检索到的文档分配给一个以上的主题,则将文档主题的列表显示给请求者,然后请求者指定相关主题。然后,仅授予请求者访问分配给相关主题的文档的权限。建立并维护了将搜索词链接到文档并将文档链接到主题的知识库( 1408 ),以加快将来的搜索速度。此外,提出了新的策略来应对变化的网站的不同更新频率。

著录项

  • 公开/公告号US2005108200A1

    专利类型

  • 公开/公告日2005-05-19

    原文格式PDF

  • 申请/专利权人 FRANK MEIK;MICHAEL WIELSCH;

    申请/专利号US20040482833

  • 发明设计人 MICHAEL WIELSCH;FRANK MEIK;

    申请日2001-07-04

  • 分类号G06F17/30;

  • 国家 US

  • 入库时间 2022-08-21 22:25:35

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号