Data Clustering: Integrating Different Distance Measures with Modified k-Means Algorithm

机译：数据集群：使用修改的K-means算法集成不同的距离测量

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Unsupervised learning is the process to partition the given data set into number of clusters where similar data objects belongs same cluster and dissimilar data objects belongs to another cluster. k-Means is the partition based unsuper-vised learning algorithm which is popular for its simplicity and ease of use. Yet, k-Means suffers from the major shortcoming of passing number of clusters and centroids in advance. Decimal scaling is one of the normalization approaches which standardize the features of the dataset and improve the effectiveness of the algorithm. Integrating different distance measures with modified k-Means algo-rithm help to select the proper distance measure for specific data mining applica-tion. This paper compare the results of modified k-Means with different distance measures like Euclidean Distance, Manhattan Distance, Minkowski Distance, Cosine Measure Distance and the Decimal Scaling normalization approach. Result Analysis is taken on various datasets from UCI machine dataset repository and shows that Mk-Means is advantageous and improve the effectiveness with normalized approach and Minkowski distance measure.

机译：未经监督的学习是将给定数据分配到类似数据对象所属相同群集的群集数量的过程，并且不同数据对象属于另一个群集。 K-means是基于分区的无核解学习算法，其简单和易用性是流行的。然而，K-Means提前遭受传球数量和质心的主要缺点。十进制缩放是标准化数据集的特征的标准化方法之一，提高算法的有效性。将不同的距离措施与改进的k均值算法集成有助于为特定数据挖掘应用选择适当的距离测量。本文比较了改进的K-meric的结果，具有不同距离措施，如欧几里德距离，曼哈顿距离，Minkowski距离，余弦测量距离和小数尺度归一化方法。结果分析来自UCI机器数据集存储库的各种数据集，并显示MK-ince是有利的，并提高归一化方法和Minkowski距离测量的效果。

著录项

来源
《International Conference on Soft Computing for Problem Solving》|2012年||共10页
会议地点
作者
Vaishali R. Patel; Rupa G. Mehta;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP301.6-532;
关键词
Cluster Analysis; Decimal Scaling; Distance Measures; Mk-Means Algorithm;

机译：聚类分析;小数缩放;距离措施;MK-均值算法;

相似文献

外文文献
中文文献
专利

1. Distance based k-means clustering algorithm for determining number of clusters for high dimensional data [J] . Alibuhtto M., Mahat N. Decision Science Letters . 2020,第1期

机译：基于距离的K均值聚类算法，用于确定高维数据的簇数
2. Fuzzy K-means clustering algorithms for interval-valued data based on adaptive quadratic distances [J] . Francisco de A.T. de Carvalho, Camilo P. Tenorio Fuzzy sets and systems . 2010,第23期

机译：基于自适应二次距离的区间值数据模糊K-均值聚类算法
3. Improved rough k-means clustering algorithm based on weighted distance measure with Gaussian function [J] . Zhang Tengfei, Ma Fumin International journal of computer mathematics . 2017,第1a4期

机译：基于高斯函数加权距离测度的改进的粗糙k均值聚类算法
4. Data Clustering: Integrating Different Distance Measures with Modified k-Means Algorithm [C] . Vaishali R. Patel, Rupa G. Mehta International Conference on Soft Computing for Problem Solving . 2012

机译：数据集群：使用修改的K-means算法集成不同的距离测量
5. Clustering educational digital library usage data: Comparisons of latent class analysis and K-means algorithms [D] . Xu, Beijie 2011

机译：聚集教育数字图书馆使用数据：潜在类别分析和K-means算法的比较
6. Balancing effort and benefit of K-means clustering algorithms in Big Data realms [O] . Joaquín Pérez-Ortega, Nelva Nely Almanza-Ortega, David Romero 2012

机译：大数据领域中K均值聚类算法的平衡工作和收益
7. An Entropy Regularization k-Means Algorithm with a New Measure of between-Cluster Distance in Subspace Clustering [O] . Liyan Xiong, Cheng Wang, Xiaohui Huang, 2019

机译：熵正则化k-means算法，具有新的子空间聚类中群集距离的新措施

Data Clustering: Integrating Different Distance Measures with Modified k-Means Algorithm

摘要

著录项

相似文献

相关主题

期刊订阅