Efficient and adaptive Web replication using content clustering

Yan Chen; Lili Qiu; Weiyu Chen; Luan Nguyen; Katz R.H.

首页> 外文期刊>IEEE Journal on Selected Areas in Communications >Efficient and adaptive Web replication using content clustering

【24h】

Efficient and adaptive Web replication using content clustering

机译：使用内容集群的高效和自适应Web复制

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recently, there has been an increasing deployment of content distribution networks (CDNs) that offer hosting services to Web content providers. In this paper, we first compare the uncooperative pulling of Web contents used by commercial CDNs with the cooperative pushing. Our results show that the latter can achieve comparable users' perceived performance with only 4%-5% of replication and update traffic compared with the former scheme. Therefore, we explore how to efficiently push content to CDN nodes. Using trace-driven simulation, we show that replicating content in units of URLs can yield 60%-70% reduction in clients' latency, compared with replicating in units of Websites. However, it is very expensive to perform such a fine-grained replication. To address this issue, we propose to replicate content in units of clusters, each containing objects which are likely to be requested by clients that are topologically close. To this end, we describe three clustering techniques and use various topologies and several large Web server traces to evaluate their performance. Our results show that the cluster-based replication achieves performance close to that of the URL-based scheme, but only at 1%-2% of computation and management cost. In addition, by adjusting the number of clusters, we can smoothly trade off management and computation cost for better client performance. To adapt to changes in users' access patterns, we also explore incremental clustering that adaptively adds new documents to the existing content clusters. We examine both offline and online incremental clustering, where the former assumes access history is available while the latter predicts access pattern based on the hyperlink structure. Our results show that the offline clustering yields performance close to that of the complete re-clustering at much lower overhead. The online incremental clustering and replication cut down the retrieval cost by 4.6 times compared with random and by 8 times compared with no replication. Therefore it is especially useful to improve document availability during flash crowds.

机译：最近，越来越多的内容分发网络（CDN）部署向Web内容提供商提供托管服务。在本文中，我们首先将商业CDN所使用的Web内容的非合作拉动与合作推动进行比较。我们的结果表明，与前一种方案相比，后者只需复制和更新流量的4％-5％即可达到可比的用户感知性能。因此，我们探索如何有效地将内容推送到CDN节点。使用跟踪驱动的模拟，我们显示，与以网站为单位进行复制相比，以URL为单位进行复制可以减少60％-70％的客户端延迟。但是，执行这种细粒度的复制非常昂贵。为了解决此问题，我们建议以群集为单位复制内容，每个群集包含可能由拓扑接近的客户端请求的对象。为此，我们描述了三种群集技术，并使用各种拓扑和几个大型Web服务器跟踪来评估其性能。我们的结果表明，基于群集的复制实现的性能接近基于URL的方案，但仅占计算和管理成本的1％-2％。此外，通过调整集群数量，我们可以平滑地权衡管理和计算成本，以获得更好的客户端性能。为了适应用户访问模式的变化，我们还探索了增量聚类，该聚类将新文档自适应地添加到现有的内容聚类中。我们研究了离线和在线增量集群，其中前者假定访问历史可用，而后者则基于超链接结构预测访问模式。我们的结果表明，离线集群产生的性能与完全重新集群的性能接近，而开销却低得多。在线增量聚类和复制与随机相比，将检索成本降低了4.6倍，与无复制相比，降低了8倍。因此，在闪存拥挤期间提高文档可用性特别有用。

著录项

来源
《IEEE Journal on Selected Areas in Communications》 |2003年第6期|p.979-994|共16页
作者
Yan Chen; Lili Qiu; Weiyu Chen; Luan Nguyen; Katz R.H.;
展开▼
作者单位

Comput. Sci. Div., Univ. of California, Berkeley, CA, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类无线电电子学、电信技术;
关键词
Internet; Web sites; file servers; replicated databases; telecommunication traffic; digital simulation; network topology; adaptive Web replication; content distribution networks; hosting services; Web content providers; content clustering; cooperativ;

机译：互联网;网站;文件服务器;复制的数据库;电信业务;数字仿真;网络拓扑结构;自适应Web复制;内容分发网络;托管服务;网络内容提供商;内容聚类;合作;

相似文献

外文文献
中文文献
专利

1. Efficient and adaptive Web replication using content clustering [J] . Yan Chen, Lili Qiu, Weiyu Chen, IEEE Journal on Selected Areas in Communications . 2003,第6期

机译：使用内容集群的高效和自适应Web复制
2. Efficient and adaptive Web replication using content clustering [J] . Yan Chen, Lili Qiu, Weiyu Chen, IEEE Journal on Selected Areas in Communications . 2003,第6期

机译：使用内容群集高效和自适应的Web复制
3. Efficient and adaptive Web replication using content clustering [J] . Yan Chen, Lili Qiu, Weiyu Chen, IEEE Journal on Selected Areas in Communications . 2003,第6期

机译：使用内容群集高效和自适应的Web复制
4. Clustering Web content for efficient replication [C] . Yan Chen, Lili Qiu, Weiyu Chen, . 2002

机译：群集Web内容以进行有效复制
5. Conflict-aware replication for dynamic content Web sites. [D] . Amza, Cristiana. 2003

机译：动态内容网站的冲突感知复制。
6. Adapting the Content of Cancer Web Sites to the Information Needs of Patients: Reliability and Readability [O] . Ruben Alba-Ruiz, Clara Bermúdez-Tamayo, Jaime Jiménez Pernett, -1

机译：使癌症网站的内容适应患者的信息需求：可靠性和可读性
7. Efficient and Adaptive Web Replication using Content Clustering [O] . Yan Chen, Lili Qiu, Weiyu Chen, 2003

机译：使用内容集群的高效和自适应Web复制

Efficient and adaptive Web replication using content clustering

摘要

著录项

相似文献

相关主题

期刊订阅