In this research, we propose the similarity matrix based version of NTSO as the approach to the text clustering. For using one of traditional approaches to text clustering, documents should be encoded into numerical vectors; encoding so causes the two main problems: the huge dimensionality and the sparse distribution. In order to solve the problems, in this research, we propose to encode documents into string vectors and use the NTSO (Neural Text Self Organization) as the string vector based neural network for the text clustering. By encoding documents into another form, we attempt to avoid the two main problems, completely. As the empirical validation, the proposed approach will be compared with others with respect to the clustering performance and speed.
展开▼