首页> 外国专利> STATISTICAL THESAURUS, METHOD OF FORMING SAME, AND USE THEREOF IN QUERY EXPANSION IN AUTOMATED TEXT SEARCHING

STATISTICAL THESAURUS, METHOD OF FORMING SAME, AND USE THEREOF IN QUERY EXPANSION IN AUTOMATED TEXT SEARCHING

机译：统计同义词库，其形成方法及其在自动文本搜索中的查询扩展中的使用

页面导航

摘要
著录项
相似文献

摘要

A statistical thesaurus is built dynamically, from the same text collection that is being searched, allowing improved generation of expanded query terms. The thesaurus is dynamic in that thesaurus records are collected, ranked, accessed, and applied dynamically. Thesaurus "records" are actually formed as indexed documents arranged in "collections". The collections are preferably distinguished based on text source. Each record has terms assembled in indexed groups which inherently reflect a ranking based on relevance to an initial query. After an initial query is received, the appropriate collection(s) of records may be searched by a conventional search and retrieval engine, the searches inherently returning records ranked by degree of relevance due the record indexing scheme. A record ranking scheme avoids contamination of relevant records by less relevant records. The record selection and the expansion query term generation processes are each divided into parallel threads. The separate threads correspond to respective text sources to enable the improved expansion query term generation to be provided in real time.

机译：统计同义词库是从正在搜索的同一文本集中动态构建的，从而可以改进扩展查询词的生成。同义词库是动态的，因为同义词库记录是动态收集，排序，访问和应用的。同义词库“记录”实际上是作为按“集合”排列的索引文件而形成的。优选基于文本源来区分集合。每个记录都有以索引组组合的术语，这些索引组固有地反映了基于与初始查询的相关性的排名。在接收到初始查询之后，常规的搜索和检索引擎可以搜索适当的记录集合，由于记录索引方案，搜索固有地返回按相关度排序的记录。记录排序方案避免了相关性较低的记录对相关记录的污染。记录选择和扩展查询项生成过程分别分为并行线程。单独的线程对应于相应的文本源，以便能够实时提供改进的扩展查询字词生成。

著录项

公开/公告号CA2248793A1

专利类型
公开/公告日1997-09-18

原文格式PDF
申请/专利权人 LEXIS-NEXIS A DIVISION OF REED ELSEVIER INC.;
展开▼

申请/专利号CA19972248793
发明设计人 MILLER DAVID JAMES;HOLT JOHN D.;LU XIN ALLAN;
展开▼

申请日1997-03-07
分类号G06F17/30;
国家 CA
入库时间 2022-08-22 03:23:15

相似文献

专利
外文文献
中文文献