首页> 外文OA文献 >CASS: A distributed network clustering algorithm based on structure similarity for large-scale network
【2h】

CASS: A distributed network clustering algorithm based on structure similarity for large-scale network

机译:CASS:基于大规模网络结构相似性的分布式网络聚类算法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

As the size of networks increases, it is becoming important to analyze large-scale network data. A network clustering algorithm is useful for analysis of network data. Conventional network clustering algorithms in a single machine environment rather than a parallel machine environment are actively being researched. However, these algorithms cannot analyze large-scale network data because of memory size issues. As a solution, we propose a network clustering algorithm for large-scale network data analysis using Apache Spark by changing the paradigm of the conventional clustering algorithm to improve its efficiency in the Apache Spark environment. We also apply optimization approaches such as Bloom filter and shuffle selection to reduce memory usage and execution time. By evaluating our proposed algorithm based on an average normalized cut, we confirmed that the algorithm can analyze diverse large-scale network datasets such as biological, co-authorship, internet topology and social networks. Experimental results show that the proposed algorithm can develop more accurate clusters than comparative algorithms with less memory usage. Furthermore, we confirm the proposed optimization approaches and the scalability of the proposed algorithm. In addition, we validate that clusters found from the proposed algorithm can represent biologically meaningful functions.
机译:随着网络的大小增加时,它正成为分析大型网络数据重要。网络聚类算法是用于网络数据的分析是有用的。在一台机器环境,而不是平行机环境常规网络聚类算法被积极地研究。然而,这些算法不能分析因内存大小问题,大规模的网络数据。作为一种解决方案,提出了使用Apache火花通过改变常规聚类算法的范例,以提高其在Apache火花环境效率大型网络的数据分析的网络聚类算法。我们还采用优化的方法,如Bloom过滤器和随机播放选择,以减少内存使用和执行时间。通过评估我们的基础上平均标准化切割算法,我们证实,该算法可以分析不同的大型网络数据集,如生物,合着者,网络拓扑和社交网络。实验结果表明,所提出的算法可以开发更准确的簇比使用较少的内存使用情况的比较算法。此外,我们确认所提出的优化方法和算法的可扩展性。此外,我们验证从算法发现,集群可以代表具有生物学意义的功能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号