首页> 外国专利> STATISTICAL THESAURUS, METHOD OF FORMING SAME, AND USE THEREOF IN QUERY EXPANSION IN AUTOMATED TEXT SEARCHING

STATISTICAL THESAURUS, METHOD OF FORMING SAME, AND USE THEREOF IN QUERY EXPANSION IN AUTOMATED TEXT SEARCHING

机译:统计同义词库,其形成方法及其在自动文本搜索中的查询扩展中的使用

摘要

A statistical thesaurus is built dynamically, from the same text collection that is being searched, allowing improved generation of expanded query terms. The thesaurus is dynamic in that thesaurus records are collected, ranked, accessed, and applied dynamically. Thesaurus "records" are actually formed as indexed documents arranged in "collections". The collections are preferably distinguished based on text source. Each record has terms assembled in indexed groups which inherently reflect a ranking based on relevance to an initial query. After an initial query is received, the appropriate collection(s) of records may be searched by a conventional search and retrieval engine, the searches inherently returning records ranked by degree of relevance due the record indexing scheme. A record ranking scheme avoids contamination of relevant records by less relevant records. The record selection and the expansion query term generation processes are each divided into parallel threads. The separate threads correspond to respective text sources to enable the improved expansion query term generation to be provided in real time.
机译:统计同义词库是从正在搜索的同一文本集中动态构建的,从而可以改进扩展查询词的生成。同义词库是动态的,因为同义词库记录是动态收集,排序,访问和应用的。同义词库“记录”实际上是作为按“集合”排列的索引文件而形成的。优选基于文本源来区分集合。每个记录都有以索引组组合的术语,这些索引组固有地反映了基于与初始查询的相关性的排名。在接收到初始查询之后,常规的搜索和检索引擎可以搜索适当的记录集合,由于记录索引方案,搜索固有地返回按相关度排序的记录。记录排序方案避免了相关性较低的记录对相关记录的污染。记录选择和扩展查询项生成过程分别分为并行线程。单独的线程对应于相应的文本源,以便能够实时提供改进的扩展查询字词生成。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号