首页> 外文期刊>Evolving Systems >Density-based clustering of big probabilistic graphs
【24h】

Density-based clustering of big probabilistic graphs

机译:基于密度的大概率图聚类

获取原文
获取原文并翻译 | 示例
           

摘要

Clustering is a machine learning task to group similar objects in coherent sets. These groups exhibit similar behavior with-in their cluster. With the exponential increase in the data volume, robust approaches are required to process and extract clusters. In addition to large volumes, datasets may have uncertainties due to the heterogeneity of the data sources, resulting in the Big Data. Modern approaches and algorithms in machine learning widely use probability-theory in order to determine the data uncertainty. Such huge uncertain data can be transformed to a probabilistic graph-based representation. This work presents an approach for density-based clustering of big probabilistic graphs. The proposed approach deals with clustering of large probabilistic graphs using the graph’s density, where the clustering process is guided by the nodes’ degree and the neighborhood information. The proposed approach is evaluated using seven real-world benchmark datasets, namely proteinto- protein interaction, yahoo, movie-lens, core, last.fm, delicious social bookmarking system, and epinions. These datasets are first transformed to a graph-based representation before applying the proposed clustering algorithm. The obtained results are evaluated using three cluster validation indices, namely Davies–Bouldin index, Dunn index, and Silhouette coefficient. This proposal is also compared with four state-of-the-art approaches for clustering large probabilistic graphs. The results obtained using seven datasets and three cluster validity indices suggest better performance of the proposed approach.
机译:群集是一种机器学习任务,用于在连贯组中对类似的对象进行分组。这些群体表现出类似的行为与其群集。随着数据量的指数增加,需要强大的方法来处理和提取群集。除了大卷外,数据集可能具有由于数据源的异质性而具有不确定性,从而导致大数据。机器学习中的现代方法和算法广泛使用概率理论,以确定数据不确定性。这种巨大的不确定数据可以转换为基于概率图形的表示。这项工作提出了一种基于密度的大概率图形的方法。所提出的方法使用图形的密度来涉及大型概率图的聚类,其中聚类过程由节点的程度和邻域信息引导。使用七个现实世界基准数据集进行评估,即蛋白质互动,雅虎,电影镜,核心,最后一级.FM,美味社会书签系统和渗透。在应用所提出的聚类算法之前,首先将这些数据集转换为基于图形的表示。使用三个集群验证指数,即DAVIES-BOULDIN指数,DUNN指数和轮廓系数进行评估。该提案也与四种最先进的概率图进行了比较了四种最先进的方法。使用七个数据集和三个集群有效性指标获得的结果表明提出了所提出的方法的更好性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号