首页> 外文会议>Information Retrieval Technology >A Novel Fuzzy Kernel C-Means Algorithm for Document Clustering
【24h】

A Novel Fuzzy Kernel C-Means Algorithm for Document Clustering

机译:一种新的文档聚类的模糊核C均值算法

获取原文

摘要

Fuzzy Kernel C-Means (FKCM) algorithm can improve accuracy significantly compared with classical Fuzzy C-Means algorithms for nonlinear separability, high dimension and clusters with overlaps in input space. Despite of these advantages, several features are subjected to the applications in real world such as local optimal, outliers, the c parameter must be assigned in advance and slow convergence speed. To overcome these disadvantages, Semi-Supervised learning and validity index are employed. Semi-Supervised learning uses limited labeled data to assistant a bulk of unlabeled data. It makes the FKCM avoid drawbacks proposed. The number of cluster will great affect clustering performance. It isn't possible to assume the optimal number of clusters especially to large text corps. Validity function makes it possible to determine the suitable number of cluster in clustering process. Sparse format, scatter and gathering strategy save considerable store space and computation time. Experimental results on the Reuters-21578 benchmark dataset demonstrate that the algorithm proposed is more flexibility and accuracy than the state-of-art FKCM.
机译:与经典的模糊C均值算法相比,模糊K均值(FKCM)算法在非线性可分离性,高维和输入空间重叠的簇方面,可以显着提高精度。尽管具有这些优点,但现实世界中仍会应用一些功能,例如局部最优,离群值,必须预先分配c参数和降低收敛速度。为了克服这些缺点,采用了半监督学习和有效性指标。半监督学习使用有限的标记数据来辅助大量未标记的数据。这使得FKCM避免了所提出的缺点。群集数量将极大地影响群集性能。不可能假设群集的最佳数量,尤其是对于大型文本公司而言。有效性功能使确定聚类过程中适当的聚类数量成为可能。稀疏的格式,分散和收集策略节省了大量的存储空间和计算时间。在Reuters-21578基准数据集上的实验结果表明,所提出的算法比最新的FKCM更具灵活性和准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号