...
首页> 外文期刊>Journal of computer sciences >Computational Complexity between K-Means and K-Medoids Clustering Algorithms for Normal and Uniform Distributions of Data Points
【24h】

Computational Complexity between K-Means and K-Medoids Clustering Algorithms for Normal and Uniform Distributions of Data Points

机译:数据点正态分布和均匀分布的K均值和K-Medoids聚类算法之间的计算复杂性

获取原文
获取原文并翻译 | 示例

摘要

Problem statement: Clustering is one of the most important research areas in the field of data mining. Clustering means creating groups of objects based on their features in such a way that the objects belonging to the same groups are similar and those belonging to different groups are dissimilar. Clustering is an unsupervised learning technique. The main advantage of clustering is that interesting patterns and structures can be found directly from very large data sets with little or none of the background knowledge. Clustering algorithms can be applied in many domains. Approach: In this research, the most representative algorithms K-Means and K-Medoids were examined and analyzed based on their basic approach. The best algorithm in each category was found out based on their performance. The input data points are generated by two ways, one by using normal distribution and another by applying uniform distribution. Results: The randomly distributed data points were taken as input to these algorithms and clusters are found out for each algorithm. The algorithms were implemented using JAVA language and the performance was analyzed based on their clustering quality. The execution time for the algorithms in each category was compared for different runs. The accuracy of the algorithm was investigated during different execution of the program on the input data points. Conclusion: The average time taken by K-Means algorithm is greater than the time taken by K-Medoids algorithm for both the case of normal and uniform distributions. The results proved to be satisfactory.
机译:问题陈述:集群是数据挖掘领域中最重要的研究领域之一。聚类意味着基于对象的特征创建对象组,以使属于相同组的对象相似而属于不同组的对象不相似。聚类是一种无监督的学习技术。聚类的主要优点是,可以从非常少的背景知识或几乎不需要背景知识的大型数据集中直接找到有趣的模式和结构。聚类算法可以应用于许多领域。方法:在这项研究中,最有代表性的算法K-Means和K-Medoids基于其基本方法进行了分析。根据它们的性能找出每种类别中的最佳算法。输入数据点有两种生成方式,一种是使用正态分布,另一种是应用均匀分布。结果:将随机分布的数据点作为这些算法的输入,并为每种算法找出了聚类。该算法使用JAVA语言实现,并基于其聚类质量对性能进行了分析。比较了每个类别中算法在不同运行时间下的执行时间。在输入数据点上不同执行程序的过程中研究了算法的准确性。结论:在正态分布和均匀分布情况下,K-Means算法花费的平均时间均大于K-Medoids算法花费的时间。结果证明是令人满意的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号