Detecting anomalies are essential for improving the reliability of enterprise applications. Current approaches set thresholds for metrics or model correlations between metrics, and anomalies are detected when the thresholds are violated or the correlations are broken. However, we have found that the dynamic workload fluctuating over multiple time scales causes system metrics and their correlations to change. Moreover, it is difficult to model various metric correlations in complex applications. This paper addresses these problems and proposes an online anomaly detection approach for enterprise applications. A method is presented for recognizing workload patterns with an incremental clustering algorithm. The Local Outlier Factor (LOF) based on the specific workload pattern is adopted for detecting anomalies. Our approach is evaluated on a testbed running the TPC-W benchmark. The experimental results show that our approach can capture workload fluctuations accurately and detect the typical faults effectively.
展开▼