首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Resource-Efficient Index Shard Replication in Large Scale Search Engines
【24h】

Resource-Efficient Index Shard Replication in Large Scale Search Engines

机译:大型搜索引擎中的资源高效索引分片复制

获取原文
获取原文并翻译 | 示例

摘要

With the rapid growth of the Web scale, large scale search engines have to set up a huge number of machines to place the index files of the Web contents. The index files are normally divided into smaller index shards which are often replicated so that queries can be processed in parallel. We observe from real systems that the index shard replication strategy could have a significant impact on the resource usage. In this paper, we investigate the index shard replication problem with the goal of minimizing the resource usage in search engine datacenters. We consider both the offline version and online version of the problem, and formulate the problems as non-linear integer programming problems. We propose several heuristic algorithms to approximate the optimal solution. The proposed algorithms are evaluated by extensive experiments using both synthetic data and real data from commercial search engines. The results demonstrate the effectiveness of the proposed algorithms. Our work also yields many insights about the impact of different input properties on the performance of each algorithm. We believe that this paper will provide valuable guidance to the design of the index shard replication strategy in practice.
机译:随着Web规模的快速增长,大型搜索引擎不得不设置大量机器来放置Web内容的索引文件。索引文件通常分为较小的索引分片,通常会对其进行复制,以便可以并行处理查询。我们从实际系统中观察到,索引分片复制策略可能会对资源使用产生重大影响。在本文中,我们研究了索引分片复制问题,旨在最大程度地减少搜索引擎数据中心中的资源使用。我们同时考虑问题的离线版本和在线版本,并将这些问题表述为非线性整数编程问题。我们提出了几种启发式算法来近似最优解。通过使用来自商业搜索引擎的合成数据和真实数据的大量实验,对提出的算法进行了评估。结果证明了所提出算法的有效性。我们的工作还产生了关于不同输入属性对每种算法性能的影响的许多见解。我们认为,本文将为实践中的索引分片复制策略的设计提供有价值的指导。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号