首页> 外文期刊>IEEE Journal on Selected Areas in Communications >Efficient and adaptive Web replication using content clustering
【24h】

Efficient and adaptive Web replication using content clustering

机译:使用内容集群的高效和自适应Web复制

获取原文
获取原文并翻译 | 示例
           

摘要

Recently, there has been an increasing deployment of content distribution networks (CDNs) that offer hosting services to Web content providers. In this paper, we first compare the uncooperative pulling of Web contents used by commercial CDNs with the cooperative pushing. Our results show that the latter can achieve comparable users' perceived performance with only 4%-5% of replication and update traffic compared with the former scheme. Therefore, we explore how to efficiently push content to CDN nodes. Using trace-driven simulation, we show that replicating content in units of URLs can yield 60%-70% reduction in clients' latency, compared with replicating in units of Websites. However, it is very expensive to perform such a fine-grained replication. To address this issue, we propose to replicate content in units of clusters, each containing objects which are likely to be requested by clients that are topologically close. To this end, we describe three clustering techniques and use various topologies and several large Web server traces to evaluate their performance. Our results show that the cluster-based replication achieves performance close to that of the URL-based scheme, but only at 1%-2% of computation and management cost. In addition, by adjusting the number of clusters, we can smoothly trade off management and computation cost for better client performance. To adapt to changes in users' access patterns, we also explore incremental clustering that adaptively adds new documents to the existing content clusters. We examine both offline and online incremental clustering, where the former assumes access history is available while the latter predicts access pattern based on the hyperlink structure. Our results show that the offline clustering yields performance close to that of the complete re-clustering at much lower overhead. The online incremental clustering and replication cut down the retrieval cost by 4.6 times compared with random and by 8 times compared with no replication. Therefore it is especially useful to improve document availability during flash crowds.
机译:最近,越来越多的内容分发网络(CDN)部署向Web内容提供商提供托管服务。在本文中,我们首先将商业CDN所使用的Web内容的非合作拉动与合作推动进行比较。我们的结果表明,与前一种方案相比,后者只需复制和更新流量的4%-5%即可达到可比的用户感知性能。因此,我们探索如何有效地将内容推送到CDN节点。使用跟踪驱动的模拟,我们显示,与以网站为单位进行复制相比,以URL为单位进行复制可以减少60%-70%的客户端延迟。但是,执行这种细粒度的复制非常昂贵。为了解决此问题,我们建议以群集为单位复制内容,每个群集包含可能由拓扑接近的客户端请求的对象。为此,我们描述了三种群集技术,并使用各种拓扑和几个大型Web服务器跟踪来评估其性能。我们的结果表明,基于群集的复制实现的性能接近基于URL的方案,但仅占计算和管理成本的1%-2%。此外,通过调整集群数量,我们可以平滑地权衡管理和计算成本,以获得更好的客户端性能。为了适应用户访问模式的变化,我们还探索了增量聚类,该聚类将新文档自适应地添加到现有的内容聚类中。我们研究了离线和在线增量集群,其中前者假定访问历史可用,而后者则基于超链接结构预测访问模式。我们的结果表明,离线集群产生的性能与完全重新集群的性能接近,而开销却低得多。在线增量聚类和复制与随机相比,将检索成本降低了4.6倍,与无复制相比,降低了8倍。因此,在闪存拥挤期间提高文档可用性特别有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号