首页> 外文会议>National Symposium on Mathematical Sciences >Modified Distance in Average Linkage Based on M-Estimator and MAD_n Criteria in Hierarchical Cluster Analysis
【24h】

Modified Distance in Average Linkage Based on M-Estimator and MAD_n Criteria in Hierarchical Cluster Analysis

机译:基于M估计和MAD_N分层分析中的平均联动的修改距离

获取原文

摘要

The process of grouping a set of objects into classes of similar objects is called clustering. It divides a large group of observations into smaller groups so that the observations within each group are relatively similar and the observations in different groups are relatively dissimilar. In this study, an agglomerative method in hierarchical cluster analysis is chosen and clusters were constructed by using an average linkage technique. An average linkage technique requires distance between clusters, which is calculated based on the average distance between all pairs of points, one group with another group. In calculating the average distance, the distance will not be robust when there is an outlier. Therefore, the average distance in average linkage needs to be modified in order to overcome the problem of outlier. Therefore, the criteria of outlier detection based on MADn criteria is used and the average distance is recalculated without the outlier. Next, the distance in average linkage is calculated based on a modified one step M-estimator (MOM). The groups of cluster are presented in dendrogram graph. To evaluate the goodness of a modified distance in the average linkage clustering, the bootstrap analysis is conducted on the dendrogram graph and the bootstrap value (BP) are assessed for each branch in dendrogram that formed the group, to ensure the reliability of the branches constructed. This study found that the average linkage technique with modified distance is significantly superior than the usual average linkage technique, if there is an outlier. Both of these techniques are said to be similar if there is no outlier.
机译:将一组对象分组为类似对象的类别称为群集。它将大群观察分为较小的群体,使得每组内的观察结果相对相似,不同组中的观察相对不相似。在该研究中,选择分层聚类分析中的凝聚方法,并通过使用平均连杆技术构建簇。平均链接技术需要簇之间的距离,该簇基于所有与另一组的一对群体之间的平均距离计算。在计算平均距离时,当有异常值时,距离不会强大。因此,需要修改平均联动的平均距离以克服异常值的问题。因此,使用基于MADN标准的异常值检测标准,并且在没有异常值的情况下重新计算平均距离。接下来,基于修改的一个步骤M估计器(MOM)计算平均链接的距离。集群组在树木图中呈现。为了评估平均连锁聚类中修改距离的良好,在树木图中进行引导分析,对形成该组的树木图中的每个分支评估引导值(BP),以确保构造的分支的可靠性。本研究发现,如果有异常值,则具有修改距离的平均连锁技术明显优于通常的平均联动技术。如果没有异常值,则据说这两种技术都是相似的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号