...
首页> 外文期刊>The Computer journal >An Optimized k-means Algorithm Based on Information Entropy
【24h】

An Optimized k-means Algorithm Based on Information Entropy

机译:基于信息熵的优化K均值算法

获取原文
获取原文并翻译 | 示例
           

摘要

Clustering is a widely used technique in data mining applications and various pattern recognition applications, in which data objects are divided into groups. K-means algorithm is one of the most classical clustering algorithms. In this algorithm, the initial clustering centers are randomly selected, this results in unstable clustering results. To solve this problem, an optimized algorithm to select the initial centers is proposed. In the proposed algorithm, dispersion degree is defined, which is based on entropy. In the algorithm, all the objects are firstly grouped into a big cluster, and the object that has the maximum dispersion degree and the object that has the minimum dispersion degree are selected as the initial clustering centers from the initial big cluster. And then other objects in the biggest cluster are partitioned to the initial clusters to which the objects are nearest. The partition process will be repeated until the cluster number is equal to the specified value k. Finally, the partitioned k clusters and their cluster centers are applied to k-means algorithm as initial clusters and centers. Several experiments are conducted on real data sets to evaluate the proposed algorithm. The proposed algorithm is compared with traditional k-means algorithm and max-min distance clustering algorithm, and experimental results show that the improved k-means algorithm is stable in selecting initial clustering, because it can select unique initial clustering centers. The optimized algorithm's effectiveness and feasibility are also verified by experiments, and the algorithm can reduce the times of iterations and has more stable clustering results and higher accuracy.
机译:聚类是数据挖掘应用程序中广泛使用的技术和各种模式识别应用程序,其中数据对象被分成组。 K-means算法是最古典的聚类算法之一。在该算法中,初始聚类中心随机选择,这导致群集结果不稳定。为了解决这个问题,提出了一种选择初始中心的优化算法。在所提出的算法中,定义了分散度,其基于熵。在算法中,将所有对象首先分组为大群集,并且具有最大色散度的对象和具有最小色散度的对象被选择为来自初始大群的初始聚类中心。然后,最大群集中的其他对象被划分为对象最接近的初始簇。将重复分区处理,直到簇号等于指定值k。最后,将分区K集群及其群集中心应用于K-Means算法作为初始集群和中心。在真实数据集上进行了几个实验以评估所提出的算法。将所提出的算法与传统的K-Mean算法和MAX-MIN距离聚类算法进行比较,实验结果表明,改进的K-MEAS算法在选择初始聚类时是稳定的,因为它可以选择唯一的初始聚类中心。通过实验还验证了优化的算法的有效性和可行性,并且该算法可以减少迭代时间并具有更稳定的聚类结果和更高的准确性。

著录项

  • 来源
    《The Computer journal》 |2021年第7期|1130-1143|共14页
  • 作者单位

    College of Artificial Intelligence and Key Laboratory of Software Engineering Guangxi University for Nationalities Naming 530006 China Guangxi Key Lab of Multi-source Information Mining & Security Guangxi Normal University Guilin 541004 China;

    Guangxi Key Lab of Multi-source Information Mining & Security Guangxi Normal University Guilin 541004 China;

    College of Artificial Intelligence and Key Laboratory of Software Engineering Guangxi University for Nationalities Naming 530006 China;

    College of Artificial Intelligence and Key Laboratory of Software Engineering Guangxi University for Nationalities Naming 530006 China;

    College of Artificial Intelligence and Key Laboratory of Software Engineering Guangxi University for Nationalities Naming 530006 China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    information entropy; dispersion degree; k-means; clustering; clustering center;

    机译:信息熵;分散度;K-means;聚类;聚类中心;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号