首页> 外文期刊>Information Systems >Efficiency and effectiveness of query processing in cluster-based retrieval
【24h】

Efficiency and effectiveness of query processing in cluster-based retrieval

机译:基于集群的检索中查询处理的效率和有效性

获取原文
获取原文并翻译 | 示例
       

摘要

Our research shows that for large databases, without considerable additional storage overhead, cluster-based retrieval (CBR) can compete with the time efficiency and effectiveness of the inverted index-based full search (FS). The proposed CBR method employs a storage structure that blends the cluster membership information into the inverted file posting lists. This approach significantly reduces the cost of similarity calculations for document ranking during query processing and improves efficiency. For example, in terms of in-memory computations, our new approach can reduce query processing time to 39% of FS. The experiments confirm that the approach is scalable and system performance improves with increasing database size. In the experiments, we use the cover coefficient-based clustering methodology (C~3M), and the Financial Times database of TREC containing 210158 documents of size 564 MB defined by 229 748 terms with total of 29 545 234 inverted index elements. This study provides CBR efficiency and effectiveness experiments using the largest corpus in an environment that employs no user interaction or user behavior assumption for clustering.
机译:我们的研究表明,对于大型数据库,没有大量额外的存储开销,基于集群的检索(CBR)可以与基于反向索引的完整搜索(FS)的时间效率和有效性相媲美。提出的CBR方法采用了一种存储结构,该结构将群集成员信息混合到倒排的文件发布列表中。这种方法显着降低了查询处理期间用于文档排名的相似度计算成本,并提高了效率。例如,就内存计算而言,我们的新方法可以将查询处理时间减少到FS的39%。实验证实该方法是可扩展的,并且系统性能随着数据库大小的增加而提高。在实验中,我们使用基于覆盖系数的聚类方法(C〜3M),以及TREC的《金融时报》数据库,其中包含210158个文档,大小为564 MB,由229748个术语定义,共有29545234个反向索引元素。这项研究在不使用用户交互或用户行为假设进行聚类的环境中,使用最大的语料库提供了CBR效率和有效性实验。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号