首页> 外文期刊>Information Processing & Management >Document replication strategies for geographically distributed web search engines
【24h】

Document replication strategies for geographically distributed web search engines

机译:地理分布的Web搜索引擎的文档复制策略

获取原文
获取原文并翻译 | 示例
       

摘要

Large-scale web search engines are composed of multiple data centers that are geographically distant to each other. Typically, a user query is processed in a data center that is geographically close to the origin of the query, over a replica of the entire web index. Compared to a centralized, single-center search engine, this architecture offers lower query response times as the network latencies between the users and data centers are reduced. However, it does not scale well with increasing index sizes and query traffic volumes because queries are evaluated on the entire web index, which has to be replicated and maintained in all data centers. As a remedy to this scalability problem, we propose a document replication framework in which documents are selectively replicated on data centers based on regional user interests. Within this framework, we propose three different document replication strategies, each optimizing a different objective: reducing the potential search quality loss, the average query response time, or the total query workload of the search system. For all three strategies, we consider two alternative types of capacity constraints on index sizes of data centers. Moreover, we investigate the performance impact of query forwarding and result caching. We evaluate our strategies via detailed simulations, using a large query log and a document collection obtained from the Yahoo! web search engine.
机译:大型网络搜索引擎由彼此地理位置遥远的多个数据中心组成。通常,在整个Web索引的副本上,在地理位置上接近查询来源的数据中心中处理用户查询。与集中式,单中心的搜索引擎相比,由于减少了用户和数据中心之间的网络延迟,该体系结构提供了较短的查询响应时间。但是,它不能随着索引大小和查询流量的增加而很好地扩展,因为查询是在整个Web索引上评估的,必须在所有数据中心中复制和维护该Web索引。为了解决此可伸缩性问题,我们提出了一种文档复制框架,其中可以根据区域用户的兴趣在数据中心上选择性地复制文档。在此框架内,我们提出了三种不同的文档复制策略,每种策略都优化了一个不同的目标:减少潜在的搜索质量损失,平均查询响应时间或搜索系统的总查询工作量。对于所有这三种策略,我们考虑对数据中心索引大小的两种替代类型的容量约束。此外,我们调查了查询转发和结果缓存对性能的影响。我们使用大型查询日志和从Yahoo!获得的文档集合,通过详细的模拟评估我们的策略。网络搜索引擎。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号