首页> 美国卫生研究院文献>PLoS Clinical Trials >CASS: A distributed network clustering algorithm based on structure similarity for large-scale network
【2h】

CASS: A distributed network clustering algorithm based on structure similarity for large-scale network

机译:CASS:一种基于结构相似性的大规模网络分布式网络聚类算法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

As the size of networks increases, it is becoming important to analyze large-scale network data. A network clustering algorithm is useful for analysis of network data. Conventional network clustering algorithms in a single machine environment rather than a parallel machine environment are actively being researched. However, these algorithms cannot analyze large-scale network data because of memory size issues. As a solution, we propose a network clustering algorithm for large-scale network data analysis using Apache Spark by changing the paradigm of the conventional clustering algorithm to improve its efficiency in the Apache Spark environment. We also apply optimization approaches such as Bloom filter and shuffle selection to reduce memory usage and execution time. By evaluating our proposed algorithm based on an average normalized cut, we confirmed that the algorithm can analyze diverse large-scale network datasets such as biological, co-authorship, internet topology and social networks. Experimental results show that the proposed algorithm can develop more accurate clusters than comparative algorithms with less memory usage. Furthermore, we confirm the proposed optimization approaches and the scalability of the proposed algorithm. In addition, we validate that clusters found from the proposed algorithm can represent biologically meaningful functions.
机译:随着网络规模的扩大,分析大规模网络数据变得越来越重要。网络聚类算法可用于分析网络数据。正在积极研究单机环境而非并行机环境中的常规网络聚类算法。但是,由于内存大小问题,这些算法无法分析大规模网络数据。作为解决方案,我们提出了一种网络聚类算法,通过更改常规聚类算法的范式以提高其在Apache Spark环境中的效率,从而使用Apache Spark进行大规模网络数据分析。我们还应用了布隆过滤器和混洗选择等优化方法来减少内存使用和执行时间。通过基于平均归一化割线评估我们提出的算法,我们证实了该算法可以分析各种大规模的网络数据集,例如生物学,共同作者,互联网拓扑和社交网络。实验结果表明,与比较算法相比,所提算法可以开发出更精确的簇,且内存使用量更少。此外,我们确认了所提出的优化方法和所提出算法的可扩展性。此外,我们验证了从提出的算法中发现的聚类可以代表生物学上有意义的功能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号