首页> 外文会议>Sriwijaya International Conference on Information Technology and Its Applications >Effect of Distance Metrics in Determining K-Value in K-Means Clustering Using Elbow and Silhouette Method
【24h】

Effect of Distance Metrics in Determining K-Value in K-Means Clustering Using Elbow and Silhouette Method

机译:距离度量在k-meria聚类中确定k值的效果

获取原文

摘要

Clustering is one of the main task in datamining. It is useful to group and cluster the data. There are a few ways to cluster the data such as partitional-based, hierarchical-based and density based. Partitional-based clustering is a way to cluster data with non-overlapping subsets. One of the most popular partitional-based clustering algorithm is K-means. K-means is an algorithm to cluster data in to K cluster and based their distance to its centroid. Due to the pational, a few factors that must be determined before using K-means is the value of K. Determining the value of K is a big problem because there is no universal way to find the value of K. Two popular ways to determine the value of K is using elbow and silhouette method. This method is graph based. But before using this method another factor is important to determine and that is the metrics distance that will be used. This paper will show the effect of three distance metric Manhattan, Euclidian and Minkowski in finding the value of K using elbow and silhouette method. Based on this study the choice of distance matrix used has little impact in determining the value of K in K-means using elbow and silhouette. Manhattan distance has the most variant in the elbow and silhouette graph. Elbow method is difficult to use and sometimes it is unable to define the value of K in K- means based on its graph.
机译:群集是DataMining中的主要任务之一。对数据进行组和群集是有用的。有几种方法可以培养基于分区的基于分层和密度的数据。基于分区的群集是一种与非重叠子集进行群集数据的方式。最受欢迎的基于分区的聚类算法之一是K-means。 K-means是一种算法,将数据纳入k集群,并将其与其质心的距离基于距离。由于pational,在使用k-means之前必须确定的几个因素是k的值。确定k的值是一个大问题,因为没有普遍来找到k的价值。两个流行的方式来确定K的值是使用弯头和轮廓方法。该方法是基于图形的。但是在使用此方法之前,另一个因素来确定,这是将使用的度量距离。本文将展示三个距离公制曼哈顿,欧几里德和Minkowski在使用肘部和剪影方法找到k的值。基于该研究,使用的距离矩阵的选择对于使用肘部和剪影确定k均值的k值几乎没有影响。曼哈顿距离在肘部和剪影图中具有最变体。肘部方法难以使用,有时它无法基于其图形定义K-均值的k值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号