首页> 外文会议>International Conference on Smart Technology and Applications >Improved K-Means Algorithm on Home Industry Data Clustering in the Province of Bangka Belitung
【24h】

Improved K-Means Algorithm on Home Industry Data Clustering in the Province of Bangka Belitung

机译:Bangka Belitung省内家庭行业数据集群的改进k均值算法

获取原文

摘要

The Government of Bangka Belitung Islands Province has not classified the home industry until now. Based on these problems, we propose a k-means algorithm for clustering home industry data. The k-means algorithm is widely used because it is straightforward and very suitable for grouping data. However, in its application, the k-means algorithm has a weakness in determining the starting point of the cluster center and, in its selection, is still carried out randomly. As a result, if the random value for initializing the initial centroid value is not right, then the grouping is less than optimal. Internal cluster validation is one way to determine the optimal cluster without knowing prior information from the data. This study aims to identify the optimal group by making improvements to the k-means algorithm and then to test it by applying an internal cluster, namely the Davies-Bouldin Index (DBI) and the Silhouette Index (SI) on the data of home industry in Bangka Belitung Island Province. The optimal cluster calculation results based on internal cluster validation both show that the Silhouette index and the DBI index with k = 3 on improved k-means algorithm. While the traditional k-means algorithm of internal cluster validation both show that the Silhouette index and the Davies-Bouldin Index with k = 2. The conclusion is k = 3 on the Davies-Bouldin Index of this research data gives good results for clustering home industry data in Bangka Belitung Islands Province.
机译:Bangka Belitung Islands省的政府迄今并未将家庭行业分列。基于这些问题,我们提出了一种用于聚类家庭行业数据的K均值算法。 K-Means算法被广泛使用,因为它很简单,非常适合于分组数据。然而,在其应用中,K-Means算法在确定集群中心的起点时具有弱点,并且在其选择中仍然随机进行。结果,如果要初始化初始质心值的随机值不对,则分组小于最佳状态。内部群集验证是确定最佳群集的一种方法,而不知道来自数据的先前信息。本研究旨在通过改进K-Means算法来识别最佳组,然后通过应用内部集群来测试它,即戴维斯 - 博尔德指数(DBI)和家庭行业数据上的轮廓索引(SI)在曼谷贝利恩岛省。基于内部群集验证的最佳聚类计算结果显示,在改进的K均值算法上,剪影索引和带有k = 3的DBI索引。虽然传统的内部集群验证算法兼出了剪影索引和戴维斯 - Bouldin指数与k = 2。结论是k = 3对该研究数据的Davies-bouldin指数上的k = 3给出了聚类家庭的良好结果曼卡贝斯岛省的行业数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号