The problems of workload characterization, performance modeling, workload and performance forecasting, and capacity planning are fundamental to the growth of Web services and applications. Previous studies have primarily focused on the complexity of Web traffic at the level of object-hits or page-views. In contrast, our study focuses on higher-level characteristics, and introduces techniques for profiling, clustering and classification of Web site traffic. In particular, we devise novel techniques for efficient and automated extraction of Web traffic patterns from access logs, for efficient and automated clustering of such traffic patterns, and for efficient and automated classification of Web traffic based on the extraction and clustering of traffic templates. Our approach has been applied to more than 25 existing commercial Web sites. Moreover, it has been demonstrated that our approaches can accurately capture and characterize the complexities of Web traffic in commercial Web si tes. These methods provide new solutions to solve the challenging problems such as workload and performance prediction, and short-term and long-term capacity planning.
展开▼