首页> 外文会议>Conference of the Spanish Association for Artificial Intelligence >An Approach to Silhouette and Dunn Clustering Indices Applied to Big Data in Spark
【24h】

An Approach to Silhouette and Dunn Clustering Indices Applied to Big Data in Spark

机译:在Spark中将剪影和Dunn聚类指标应用于大数据的一种方法

获取原文

摘要

K-Means and Bisecting K-Means clustering algorithms need the optimal number into which the dataset may be divided. Spark implementations of these algorithms include a method that is used to calculate this number. Unfortunately, this measurement presents a lack of precision because it only takes into account a sum of intra-cluster distances misleading the results. Moreover, this measurement has not been well-contrasted in previous researches about clustering indices. Therefore, we introduce a new Spark implementation of Silhouette and Dunn indices. These clustering indices have been tested in previous works. The results obtained show the potential of Silhouette and Dunn to deal with Big Data.
机译:K均值和平分K均值聚类算法需要将数据集划分为最佳数。这些算法的Spark实现包括一种用于计算此数字的方法。不幸的是,该测量结果缺乏精确度,因为它仅考虑了簇内距离的总和,从而误导了结果。而且,这种测量方法在先前关于聚类指标的研究中还没有得到很好的对比。因此,我们引入了Silhouette和Dunn索引的新Spark实现。这些聚类索引已在以前的工作中进行过测试。获得的结果表明Silhouette和Dunn处理大数据的潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号