首页> 外文期刊>Information Processing & Management >Optimization of some factors affecting the performance of query expansion
【24h】

Optimization of some factors affecting the performance of query expansion

机译:影响查询扩展性能的一些因素的优化

获取原文
获取原文并翻译 | 示例
       

摘要

This paper examines the factors affecting the performance of global query expansion based on term co-occurrence data and suggests a way to maximize the retrieval effectiveness. Major parameters to be optimized through experiments are term similarity measure and the weighting scheme of additional terms. The evaluation of four similarity measures tested in query expansion reveal that mutual information and Yule's Y, which emphasize low frequency terms, achieve better performance than cosine and Jaccard coefficients that have the reverse tendency. In the evaluation of three weighting schemes, similarity weight performs well only with short queries, whereas fixed weights of approximately 0.5 and similarity rank weights were effective with queries of any length. Furthermore, the optimal similarity rank weight achieving the best overall performance seems to be the least affected by test collections and the number of additional terms. For the efficiency of retrieval, the number of additional terms needs not exceed 70 in our test collections, but the optimal number may vary according to the characteristics of the similarity measure employed. (C) 2003 Elsevier Ltd. All rights reserved.
机译:本文研究了基于词共现数据的影响全局查询扩展性能的因素,并提出了一种最大化检索效率的方法。通过实验优化的主要参数是术语相似性度量和附加术语的加权方案。在查询扩展中测试的四个相似性度量的评估表明,互信息和强调低频项的Yule的Y比具有相反趋势的余弦和Jaccard系数具有更好的性能。在评估三种加权方案时,相似度权重仅在短查询中表现良好,而固定权重约0.5和相似度等级权重对于任何长度的查询都是有效的。此外,获得最佳总体性能的最佳相似性等级权重似乎受测试集合和附加条款数量的影响最小。为了提高检索效率,在我们的测试集中,附加词的数量不需要超过70,但是最佳数量可能会根据所采用相似性度量的特征而有所不同。 (C)2003 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号