An Approach to Silhouette and Dunn Clustering Indices Applied to Big Data in Spark

机译：在Spark中将剪影和Dunn聚类指标应用于大数据的一种方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

K-Means and Bisecting K-Means clustering algorithms need the optimal number into which the dataset may be divided. Spark implementations of these algorithms include a method that is used to calculate this number. Unfortunately, this measurement presents a lack of precision because it only takes into account a sum of intra-cluster distances misleading the results. Moreover, this measurement has not been well-contrasted in previous researches about clustering indices. Therefore, we introduce a new Spark implementation of Silhouette and Dunn indices. These clustering indices have been tested in previous works. The results obtained show the potential of Silhouette and Dunn to deal with Big Data.

机译：K均值和平分K均值聚类算法需要将数据集划分为最佳数。这些算法的Spark实现包括一种用于计算此数字的方法。不幸的是，该测量结果缺乏精确度，因为它仅考虑了簇内距离的总和，从而误导了结果。而且，这种测量方法在先前关于聚类指标的研究中还没有得到很好的对比。因此，我们引入了Silhouette和Dunn索引的新Spark实现。这些聚类索引已在以前的工作中进行过测试。获得的结果表明Silhouette和Dunn处理大数据的潜力。

著录项

来源
《Conference of the Spanish Association for Artificial Intelligence》|2016年|160-169|共10页
会议地点
作者
Jose Maria Luna-Romera; Maria del Mar Martinez-Ballesteros; Jorge Garcia-Gutierrez; Jose C. Riquelme-Santos;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Silhouette; Dunn; Clustering index; Big data; Spark;

机译：轮廓;邓恩聚类指数;大数据;火花;

相似文献

外文文献
中文文献
专利

1. Use of cluster separation indices and the influence of outliers: application of two new separation indices, the modified silhouette index and the overlap coefficient to simulated data and mouse urine metabolomic profiles [J] . Sarah J. Dixon, Nina Heinrich, Maria Holmboe Journal of Chemometrics . 2009,第1a2期

机译：使用集群分离指数和异常值的影响：应用两个新的分离指标，修改的轮廓索引和模拟数据和小鼠尿代谢组分的重叠系数
2. Approximating Dunn’s Cluster Validity Indices for Partitions of Big Data [J] . Rathore Punit, Ghafoori Zahra, Bezdek James C., Cybernetics, IEEE Transactions on . 2019,第5期

机译：估算Dunn大数据分区的集群有效性指标
3. A Novel Approach for Privacy Preservation in Bigdata Using Data Perturbation in Nested Clustering in Apache Spark [J] . Hariharan R, Raj S. Saran, Vimala R Journal of computational and theoretical nanoscience . 2018,第9a10期

机译：在Apache Spark中使用数据扰动的BigData中隐私保存的新方法
4. An Approach to Silhouette and Dunn Clustering Indices Applied to Big Data in Spark [C] . Jose Maria Luna-Romera, Maria del Mar Martinez-Ballesteros, Jorge Garcia-Gutierrez, Conference of the Spanish Association for Artificial Intelligence . 2016

机译：一种剪影和DUNN聚类指数的方法，应用于火花的大数据
5. A Relational Framework for Clustering and Cluster Validity and the Generalization of the Silhouette Measure. [D] . Rawashdeh, Mohammad Y. 2013

机译：聚类和聚类有效性的关系框架以及轮廓测度的推广。
6. DAFi: A Directed Recursive Data Filtering and Clustering Approach for Improving and Interpreting Data Clustering Identification of Cell Populations from Polychromatic Flow Cytometry Data [O] . Alexandra J. Lee, Ivan Chang, Julie G. Burel, -1

机译：DAFi：一种有指导性的递归数据过滤和聚类方法用于改进和解释多色流式细胞仪数据对细胞群体的数据聚类识别
7. A novel approach for big data classification based on hybrid parallel dimensionality reduction using spark cluster [O] . Ahmed Hussein Ali, Mahmood Zaki Abdullah 2019

机译：一种基于Spark Cluster的混合平行维度减少的大数据分类的新方法
8. Development of diagnostic techniques for the spark discharge and the flame characteristics, applied in a lean burn spark ignition engine to examine the relations between the spark discharge, the flame, the heat release and the fluid flow [R] . Lassesson, B. , Marforio, K. 1994

机译：开发用于火花放电和火焰特性的诊断技术，应用于稀燃火花点火发动机，以检查火花放电，火焰，热释放和流体流动之间的关系

An Approach to Silhouette and Dunn Clustering Indices Applied to Big Data in Spark

摘要

著录项

相似文献

相关主题

期刊订阅