首页> 外文期刊>Scientific reports. >Identifying robust communities and multi-community nodes by combining top-down and bottom-up approaches to clustering
【24h】

Identifying robust communities and multi-community nodes by combining top-down and bottom-up approaches to clustering

机译:通过结合自上而下和自下而上的集群方法,确定强大的社区和多社区节点

获取原文
           

摘要

Biological functions are carried out by groups of interacting molecules, cells or tissues, known as communities. Membership in these communities may overlap when biological components are involved in multiple functions. However, traditional clustering methods detect non-overlapping communities. These detected communities may also be unstable and difficult to replicate, because traditional methods are sensitive to noise and parameter settings. These aspects of traditional clustering methods limit our ability to detect biological communities, and therefore our ability to understand biological functions. To address these limitations and detect robust overlapping biological communities, we propose an unorthodox clustering method called SpeakEasy which identifies communities using top-down and bottom-up approaches simultaneously. Specifically, nodes join communities based on their local connections, as well as global information about the network structure. This method can quantify the stability of each community, automatically identify the number of communities, and quickly cluster networks with hundreds of thousands of nodes. SpeakEasy shows top performance on synthetic clustering benchmarks and accurately identifies meaningful biological communities in a range of datasets, including: gene microarrays, protein interactions, sorted cell populations, electrophysiology and fMRI brain imaging.
机译:生物学功能是由相互作用的分子,细胞或组织组成的团体来实现的,这些团体称为社区。当生物成分参与多种功能时,这些社区的成员资格可能会重叠。但是,传统的聚类方法会检测不重叠的社区。这些检测到的群落也可能不稳定并且难以复制,因为传统方法对噪声和参数设置敏感。传统聚类方法的这些方面限制了我们检测生物群落的能力,因此也限制了我们理解生物学功能的能力。为了解决这些限制并检测强大的重叠生物群落,我们提出了一种非传统的聚类方法,称为SpeakEasy,该方法同时使用自上而下和自下而上的方法来识别群落。具体而言,节点根据其本地连接以及有关网络结构的全局信息加入社区。这种方法可以量化每个社区的稳定性,自动识别社区的数量,并快速将具有数十万个节点的网络群集在一起。 SpeakEasy在合成聚类基准上显示出最佳性能,并在一系列数据集中准确地识别出有意义的生物群落,包括:基因微阵列,蛋白质相互作用,分类的细胞群,电生理学和fMRI脑成像。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号