首页> 外文会议>International Conference on Contemporary Computing >Range clustering: An algorithm for empirical evaluation of classical clustering algorithms
【24h】

Range clustering: An algorithm for empirical evaluation of classical clustering algorithms

机译:范围聚类:经典聚类算法的经验评估算法

获取原文

摘要

Cluster analysis is a principal method in analytics domain of data mining. The algorithm used for clustering directly influences the results obtained from applying the clustering algorithm (clusters). Data clustering is done in order to identify the patterns and trends not identifiable from just looking at the data. Clustering may be supervised (if the machine training data set is available) or unsupervised (if the machine training data set is not available). Unsupervised clustering is usually done using k-Means Algorithm (using any distance, the most common being Euclidean and Manhattan Distance). The drawback of k-means algorithm for a large set are the rigorous calculations that need to be done to cluster a data set into multiple data subsets for every single iteration, thereby limiting its efficiency and use for large data sets. We propose a range based single pass clustering algorithm that clusters data on the basis of the range which it falls in, where the ranges are calculated using simple arithmetic mean between two values. The proposed algorithm is compared against the standard k-means algorithm (using Euclidean Distance and Manhattan Distance).
机译:聚类分析是数据挖掘分析领域中的一种主要方法。用于聚类的算法直接影响从应用聚类算法(聚类)获得的结果。进行数据聚类是为了识别仅通过查看数据便无法识别的模式和趋势。集群可以是受监督的(如果机器训练数据集可用),也可以是不受监督的(如果机器训练数据集不可用)。无监督聚类通常使用k-Means算法(使用任何距离,最常见的是欧几里得距离和曼哈顿距离)来完成。对于大型集合,k-means算法的缺点是需要进行严格的计算,才能针对每个单个迭代将数据集聚类为多个数据子集,从而限制了其效率和对大型数据集的使用。我们提出了一种基于范围的单程聚类算法,该算法根据数据所属的范围对数据进行聚类,其中使用两个值之间的简单算术平均值计算范围。将该算法与标准k均值算法(使用欧几里得距离和曼哈顿距离)进行了比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号