首页> 外文期刊>BMC Bioinformatics >Ranked Adjusted Rand: integrating distance and partition information in a measure of clustering agreement
【24h】

Ranked Adjusted Rand: integrating distance and partition information in a measure of clustering agreement

机译:排名调整的rand:在群集协议的测量中集成距离和分区信息

获取原文
           

摘要

Background Biological information is commonly used to cluster or classify entities of interest such as genes, conditions, species or samples. However, different sources of data can be used to classify the same set of entities and methods allowing the comparison of the performance of two data sources or the determination of how well a given classification agrees with another are frequently needed, especially in the absence of a universally accepted "gold standard" classification. Results Here, we describe a novel measure – the Ranked Adjusted Rand ( RAR ) index. RAR differs from existing methods by evaluating the extent of agreement between any two groupings, taking into account the intercluster distances. This characteristic is relevant to evaluate cases of pairs of entities grouped in the same cluster by one method and separated by another. The latter method may assign them to close neighbour clusters or, on the contrary, to clusters that are far apart from each other. RAR is applicable even when intercluster distance information is absent for both or one of the groupings. In the first case, RAR is equal to its predecessor, Adjusted Rand ( HA ) index. Artificially designed clusterings were used to demonstrate situations in which only RAR was able to detect differences in the grouping patterns. A study with larger simulated clusterings ensured that in realistic conditions, RAR is effectively integrating distance and partition information. The new method was applied to biological examples to compare 1) two microbial typing methods, 2) two gene regulatory network distances and 3) microarray gene expression data with pathway information. In the first application, one of the methods does not provide intercluster distances while the other originated a hierarchical clustering. RAR proved to be more sensitive than HA in the choice of a threshold for defining clusters in the hierarchical method that maximizes agreement between the results of both methods. Conclusion RAR has its major advantage in combining cluster distance and partition information, while the previously available methods used only the latter. RAR should be used in the research problems were HA was previously used, because in the absence of inter cluster distance effects it is an equally effective measure, and in the presence of distance effects it is a more complete one.
机译:背景技术生物学信息通常用于聚集或分类诸如基因,病症,物种或样品之类的兴趣的实体。然而,可以使用不同的数据来源来分类相同的实体和方法,允许比较两个数据源的性能或常常对给定分类的比较是经常需要的,特别是在没有a的情况下普遍接受“黄金标准”分类。结果在这里,我们描述了一种新颖的测量 - 排名调整的兰特(RAR)指数。通过评估任何两个分组之间的协议程度,RAR与现有方法不同,同时考虑到会议距离。该特性与评估一个方法评估在同一集群中分组的实体的病例,并被另一个方法分开。后一种方法可以将它们分配以关闭邻居群集,或者相反,相反地彼此远远彼此相距。即使在两个或一个分组中不存在聚会距离信息,RAR也适用。在第一种情况下,RAR等于其前任调整后的RAND(HA)索引。人工设计的集群用于展示只有RAR能够检测分组模式的差异的情况。具有较大模拟集群的研究确保了在现实条件下,RAR正在有效地集成距离和分区信息。将新方法应用于生物实例,以比较1)两种微生物键入方法,2)两种基因调节网络距离和3)微阵列基因表达数据与途径信息。在第一个应用程序中,其中一种方法不提供换帧距离,而另一个发起分层群集。在选择分层方法中的阈值中,RAR被证明比HA更敏感,以便在两种方法的结果之间最大化协议的分层方法中定义群集。结论RAR在结合集群距离和分区信息时具有其主要优势,而先前可用的方法仅使用后者。 RAR应该在研究问题中使用,以前使用了HA,因为在没有间簇距离的情况下,它是一个同样有效的度量,并且在存在距离效应的情况下,它是一个更完整的距离。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号