This research proposes a new strategy where documents are encoded into string vectors for text clustering and modified versions of single pass algorithms to be adaptable to string vectors. Traditionally, when the single pass algorithm is used for pattern clustering, raw data should be encoded into numerical vectors. This encoding may be difficult, depending on a given application area of pattern clustering. For example, in text clustering, encoding full texts given as raw data into numerical vectors leads to two main problems: huge dimensionality and sparse distribution. In order to address the two problems, in this research, we encode full texts into string vectors, and apply single pass algorithm to string vectors for text clustering.
展开▼