首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >A methodology to retrieve text documents from multiple databases
【24h】

A methodology to retrieve text documents from multiple databases

机译:从多个数据库检索文本文档的方法

获取原文
获取原文并翻译 | 示例

摘要

This paper presents a methodology for finding the n most similar documents across multiple text databases for any given query and for any positive integer n. This methodology consists of two steps. First, the contents of databases are indicated approximately by database representatives. Databases are ranked using their representatives with respect to the given query. We provide a necessary and sufficient condition to rank the databases optimally. In order to satisfy this condition, we provide three estimation methods. One estimation method is intended for short queries; the other two are for all queries. Second, we provide an algorithm, OptDocRetrv, to retrieve documents from the databases according to their rank and in a particular way. We show that if the databases containing the n most similar documents for a given query are ranked ahead of other databases, our methodology will guarantee the retrieval of the n most similar documents for the query. When the number of databases is large, we propose to organize database representatives into a hierarchy and employ a best-search algorithm to search the hierarchy. It is shown that the effectiveness of the best-search algorithm is the same as that of evaluating the user query against all database representatives.
机译:本文提出了一种方法,用于在多个文本数据库中针对任何给定查询和任何正整数n查找最相似的文档。该方法包括两个步骤。首先,数据库的内容大致由数据库代表表示。使用数据库的代表针对给定查询对数据库进行排名。我们提供了必要和充分的条件来对数据库进行最佳排名。为了满足这一条件,我们提供了三种估计方法。一种估算方法旨在用于简短查询。其他两个用于所有查询。其次,我们提供了OptDocRetrv算法,用于根据数据库的等级和特定方式从数据库中检索文档。我们证明,如果包含给定查询的n个最相似文档的数据库排在其他数据库之前,则我们的方法将保证检索到该查询的n个最相似文档。当数据库数量很大时,我们建议将数据库代表组织到一个层次结构中,并采用最佳搜索算法来搜索层次结构。结果表明,最佳搜索算法的有效性与针对所有数据库代表评估用户查询的有效性相同。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号