首页> 外文会议>International Conference on Soft Computing for Problem Solving >Data Clustering: Integrating Different Distance Measures with Modified k-Means Algorithm
【24h】

Data Clustering: Integrating Different Distance Measures with Modified k-Means Algorithm

机译:数据集群:使用修改的K-means算法集成不同的距离测量

获取原文

摘要

Unsupervised learning is the process to partition the given data set into number of clusters where similar data objects belongs same cluster and dissimilar data objects belongs to another cluster. k-Means is the partition based unsuper-vised learning algorithm which is popular for its simplicity and ease of use. Yet, k-Means suffers from the major shortcoming of passing number of clusters and centroids in advance. Decimal scaling is one of the normalization approaches which standardize the features of the dataset and improve the effectiveness of the algorithm. Integrating different distance measures with modified k-Means algo-rithm help to select the proper distance measure for specific data mining applica-tion. This paper compare the results of modified k-Means with different distance measures like Euclidean Distance, Manhattan Distance, Minkowski Distance, Cosine Measure Distance and the Decimal Scaling normalization approach. Result Analysis is taken on various datasets from UCI machine dataset repository and shows that Mk-Means is advantageous and improve the effectiveness with normalized approach and Minkowski distance measure.
机译:未经监督的学习是将给定数据分配到类似数据对象所属相同群集的群集数量的过程,并且不同数据对象属于另一个群集。 K-means是基于分区的无核解学习算法,其简单和易用性是流行的。然而,K-Means提前遭受传球数量和质心的主要缺点。十进制缩放是标准化数据集的特征的标准化方法之一,提高算法的有效性。将不同的距离措施与改进的k均值算法集成有助于为特定数据挖掘应用选择适当的距离测量。本文比较了改进的K-meric的结果,具有不同距离措施,如欧几里德距离,曼哈顿距离,Minkowski距离,余弦测量距离和小数尺度归一化方法。结果分析来自UCI机器数据集存储库的各种数据集,并显示MK-ince是有利的,并提高归一化方法和Minkowski距离测量的效果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号