首页> 中文期刊>计算机仿真 >加权模糊C均值文本聚类算法研究及仿真

加权模糊C均值文本聚类算法研究及仿真

     

摘要

This paper studies the text clustering. The traditional text clustering algorithms have some drawbacks such as supposing that each attribute contributes equally to the clustering results, treating all the attributes equally in clustering process. Although some algorithms can put greater weight to those important attributes by adding weights during the clustering process, their time complexity is higher. In order to handle this problem, this paper provides a new clustering algorithm, the attribute weighted fuzzy c-means algorithm. During the iteration of this algorithm, it can not only identify the weights of each attribute, but also doesn't influence the algorithm' s efficiency. Moreover,the attributes which make the smaller sum of distances in one cluster will have larger weights, otherwise, smaller weights. The simulation of this algorithm on test documents can prove that the algorithm provided by this paper gains a good computation speed and accuracy and can remark the different importance of each attribute. It can provide the reliable basis for automaticdocuments abstracting, services of digital library and automaticdocuments collecting systems.%研究文本聚类问题.传统的文本聚类算法存在着假设各特征词对聚类结果影响相同,聚类准确率较低的缺陷.还有一些算法通过加权的方法,能赋予重要特征词较大的权重,却造成了算法时间复杂度的增加.为解决上述问题,提出了一种新的属性加权模糊C均值文本聚类算法.算法能在迭代过程中标注出每一特征词的权重,却不影响算法的执行效率.使得类内距离之和较小的属性,权值较大;反之则权值较小.经多次仿真证明,提出的文本聚类算法在运算速度、准确率和标注不同属性的重要程度方面都有一定的优势.为文档自动文摘、数字图书馆服务和文档集合自动整理等系统的设计提供了可靠的依据.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号