The exponential growth of the World Wide Web has made it the most popular information dissemination tool in the world. The growth necessitates caching, prefetching and replication schemes on the Web to alleviate the Web server load, conserve the network bandwidth and reduce the retrieval latency. At the same time, cache consistency should be maintained to avoid returning stale pages to users. The study on the characteristics of Web documents helps to determine which schemes should be adopted in a specific Web environment. This paper presents characterization of Web documents based on their types and environment classes. Nine Web server traces are used in this paper, and these represent three different classes of Web environments: educational, commercial and news. The results indicate significant differences in static as well as dynamic characteristics of documents types in different Web site classes. The efficiency of caching can be improved if caching priorities and TTL (time to live) preferences are given to certain types of Web documents. We also provide guidelines for the design and development of caching and prefetching techniques that can exploit the Web document characteristics in different environments.
展开▼