针对短消息文本聚类,设计基于频繁词集和Ant-Tree的混合聚类方法.该算法利用基于频繁词集聚类算法处理文本数据的效率优势,生成初始聚簇,计算轮廓系数消除重叠文档,在此基础上再通过Ant-Tree算法继续精化,最终得到高质量的结果输出.而且聚类结果保留了描述信息和树状层级结构,提供了更广阔的应用.%As to short message text clustering, this paper designs a hybrid clustering algorithm combining by frequent term-sets and Ant-Tree algorithm. This algorithm takes the advantage of efficiency of processing text data based on the frequent term-sets clustering, produces the initial cluster, then eliminates the overlap text documents by calculating silhouette coefficient. Further refines the cluster by Ant-Tree. Thus gets the high quality clustering results. And the results that retain the description and tree structure can provide wider applications.
展开▼