The stream data mining is a hot research topic in recent years. In order to improve the efficiency of stream data mining, this paper designs an online stream data clustering algorithm IStrAP. IStrAP considers the features of stream data, such as potentially infinity, rapidness, and inability to scan historical data repeatedly, and introduces a method of eliminating outliers to the existing algorithm StrAP. IStrAP does statistical analysis of the data in reservoir (a temporary storage area) to get the statistics and the parameters that can reflect the data characteristics, removes the abnormal data from the reservoir according to the statistical properties, and then clusters the residuary data in the reservoir. The experimental results show that IStrAP can effectively eliminate outliers, and it not only has higher clustering accuracy and lower time complexity than existing StrAP algorithm, but also has better dynamic adaptability for the stream data.
展开▼