As the tractable amount of data grows in the computer science area, fast clustering algorithms are required, because traditional clustering algorithms are not feasible for very large and high-dimensional data. Many studies have been reported on the clustering of large databases, but most of them circumvent this problem by using an approximation method, resulting in the deterioration of accuracy. In this paper, we propose a new clustering algorithm by means of a partial maximum array, which can realize agglomerative hierarchical clustering with the same accuracy as the brute-force algorithm and has O(N/sup 2/) time complexity. We also present an incremental method of similarity computation which substitutes a scalar calculation for the time-consuming calculation of vector similarity. Experimental results show that clustering becomes significantly fast for large and high-dimensional data.
展开▼