...
首页> 外文期刊>Information retrieval >TSP and cluster-based solutions to the reassignment of document identifiers
【24h】

TSP and cluster-based solutions to the reassignment of document identifiers

机译:使用TSP和基于群集的解决方案来重新分配文档标识符

获取原文
获取原文并翻译 | 示例

摘要

Recent studies demonstrated that it is possible to reduce Inverted Files (IF) sizes by reassigning the document identifiers of the original collection, as this lowers the distance between the positions of documents related to a single term. Variable-bit encoding schemes can exploit the average gap reduction and decrease the total amount of bits per document pointer. This paper presents an efficient solution to the reassignment problem, which consists in reducing the input data dimensionality using a SVD transformation, as well as considering it a Travelling Salesman Problem (TSP). We also present some efficient solutions based on clustering. Finally, we combine both the TSP and the clustering strategies for reordering the document identifiers. We present experimental tests and performance results in two text TREC collections, obtaining good compression ratios with low running times, and advance the possibility of obtaining scalable solutions for web collections based on the techniques presented here.
机译:最近的研究表明,可以通过重新分配原始馆藏的文档标识符来减小反转文件(IF)的大小,因为这可以缩短与单个术语相关的文档位置之间的距离。可变位编码方案可以利用平均间隙减少并减少每个文档指针的总位数。本文提出了一种重新分配问题的有效解决方案,该解决方案包括使用SVD变换减少输入数据的维数,以及考虑旅行商问题(TSP)。我们还提出了一些基于聚类的有效解决方案。最后,我们结合了TSP和聚类策略来对文档标识符进行重新排序。我们在两个文本TREC集合中展示了实验测试和性能结果,以较低的运行时间获得了良好的压缩率,并提高了基于此处介绍的技术获得Web集合的可伸缩解决方案的可能性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号