首页> 外国专利> Method and system for searching for relevant documents from a text database collection, using statistical ranking, relevancy feedback and small pieces of text

Method and system for searching for relevant documents from a text database collection, using statistical ranking, relevancy feedback and small pieces of text

机译:使用统计排名,相关性反馈和小块文本从文本数据库集合中搜索相关文档的方法和系统

摘要

Search system and method for retrieving relevant documents from a text data base collection comprised of patents, medical and legal documents, journals, news stories and the like. Each small piece of text within the documents such as a sentence, phrase and semantic unit in the data base is treated as a document. Natural language queries are used to search for relevant documents from the data base. A first search query creates a selected group of documents. Each word in both the search query and in the documents are given weighted values. Combining the weighted values creates similarity values for each document which are then ranked according to their relevant importance to the search query. A user reading and passing through this ranked list checks off which documents are relevant or not. Then the system automatically causes the original search query to be updated into a second search query which can include the same words, less words or different words than the first search query. Words in the second search query can have the same or different weights compared to the first search query. The system automatically searches the text data base and creates a second group of documents, which as a minimum does not include at least one of the documents found in the first group. The second group can also be comprised of additional documents not found in the first group. The ranking of documents in the second group is different than the first ranking such that the more relevant documents are found closer to the top of the list.
机译:用于从包括专利,医疗和法律文件,期刊,新闻报导等的文本数据库集合中检索相关文件的搜索系统和方法。文档中的每个小文本,例如数据库中的句子,短语和语义单元,都被视为文档。自然语言查询用于从数据库中搜索相关文档。第一个搜索查询创建选定的文档组。在搜索查询和文档中的每个单词都被赋予加权值。组合加权值会为每个文档创建相似度值,然后根据它们对搜索查询的相关重要性进行排序。阅读并通过该排名列表的用户会检查哪些文档不相关。然后,系统自动使原始搜索查询更新为第二搜索查询,该第二搜索查询可以包括与第一搜索查询相同,更少或不同的单词。与第一搜索查询相比,第二搜索查询中的单词可以具有相同或不同的权重。系统自动搜索文本数据库并创建第二组文档,该文档至少不包括在第一组中找到的至少一个文档。第二组也可以包含第一组中没有的其他文档。第二组文档的排名与第一组文档的排名不同,因此,更相关的文档位于列表顶部附近。

著录项

  • 公开/公告号US5642502A

    专利类型

  • 公开/公告日1997-06-24

    原文格式PDF

  • 申请/专利权人 UNIVERSITY OF CENTRAL FLORIDA;

    申请/专利号US19940350334

  • 发明设计人 JAMES R. DRISCOLL;

    申请日1994-12-06

  • 分类号G06F17/30;G06F7/00;

  • 国家 US

  • 入库时间 2022-08-22 03:09:53

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号