Density estimation is a costly operation for computing distribution information of data sets underlying many important data mining applications, such as clustering and biased sampling. However, traditional density estimation methods are inapplicable for streaming data, which are continuously arriving large volume of data, because of their request for linear storage and square size calculation. The shortcoming limits the application of many existing effective algorithms on data streams, for which the mining problem is an emergency for applications and a challenge for research. In this paper, the problem of computing density functions over data streams is examined. A novel method attacking this shortcoming of existing methods is developed to enable density estimation for large volume of data in linear time, fixed size memory, and without lose of accuracy. The method is based on M-Kernel merging, so that limited kernel functions to be maintained are determined intelligently. The application of the new method on different streaming data models is discussed, and the result of intensive experiments is presented. The analytical and empirical result show that this new density estimation algorithm for data streams can calculate density functions on demand at any time with high accuracy for different streaming data models.
展开▼