首页> 外文期刊>Annals of Operations Research >Centroid based Tree-Structured Data Clustering Using Vertex/Edge Overlap and Graph Edit Distance
【24h】

Centroid based Tree-Structured Data Clustering Using Vertex/Edge Overlap and Graph Edit Distance

机译:基于Firedroid的树结构数据聚类使用顶点/边缘重叠和图形编辑距离

获取原文
获取原文并翻译 | 示例
           

摘要

We consider a clustering problem in which the data objects are rooted m-ary trees with known node correspondence. We assume that the nodes of the trees are unweighted, but the edges can be unweighted or weighted. We measure the similarity and distance between two trees using vertex/edge overlap (VEO) and graph edit distance (GED), respectively. For both measures, we first study the problem of finding a centroid tree of a given cluster of trees in both the unweighted and weighted edge cases. We compute the optimal centroid tree of a given cluster for all measures except the weighted VEO for which a heuristic is developed. We then propose k-means based algorithms that repeat cluster assignment and centroid update steps until convergence. The initial centroid trees are constructed based on the properties of the data. The assignment steps utilize unweighted or weighted versions of VEO or GED to assign each tree to the most similar centroid tree. In the update steps, each centroid tree is updated by considering the trees assigned to it. The proposed algorithms are compared with the traditional k-modes and k-means on randomly generated datasets and shown to be more effective and robust (to outliers) in separating trees into clusters. We also apply our algorithms on a real world brain artery data and show that the previously observed age and sex effects on brain artery structures can be revealed better by means of clustering with our algorithms than the traditional k-modes and k-means.
机译:我们考虑一个群集问题,其中数据对象是具有已知节点对应的rooted m-ary树。我们假设树的节点是未加权的,但是边缘可以是未加权的或加权的。我们使用顶点/边缘重叠(VEO)和图表编辑距离(GED)测量两棵树之间的相似性和距离。对于这两项措施,我们首先研究在未加权和加权边缘案件中找到给定树群的质心树的问题。除了开发启发式的加权VEO之外,我们计算给定集群的最佳质心树。然后,我们提出了基于K-Meance的算法,该算法重复群集分配和质心更新步骤,直到收敛。初始质心树基于数据的属性构建。分配步骤利用未加权或加权版本的Veo或GED将每棵树分配给最相似的质心树。在更新步骤中,通过考虑分配给它的树,更新每个质心树。将所提出的算法与随机生成的数据集上的传统k模式和k-means进行比较,并将树木分成簇中的更有效和强大(到异常值)。我们还在现实世界脑动脉数据上应用了我们的算法,并表明以前观察到对脑动脉结构的性别和性别的影响可以通过与传统的K-MODES和K均值的算法进行聚类来更好地揭示。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号