...
首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >Proximity Measures for Clustering Gene Expression Microarray Data: A Validation Methodology and a Comparative Analysis
【24h】

Proximity Measures for Clustering Gene Expression Microarray Data: A Validation Methodology and a Comparative Analysis

机译:基因表达微阵列数据聚类的接近度测量:一种验证方法和比较分析

获取原文
获取原文并翻译 | 示例
           

摘要

Cluster analysis is usually the first step adopted to unveil information from gene expression microarray data. Besides selecting a clustering algorithm, choosing an appropriate proximity measure (similarity or distance) is of great importance to achieve satisfactory clustering results. Nevertheless, up to date, there are no comprehensive guidelines concerning how to choose proximity measures for clustering microarray data. Pearson is the most used proximity measure, whereas characteristics of other ones remain unexplored. In this paper, we investigate the choice of proximity measures for the clustering of microarray data by evaluating the performance of 16 proximity measures in 52 data sets from time course and cancer experiments. Our results support that measures rarely employed in the gene expression literature can provide better results than commonly employed ones, such as Pearson, Spearman, and euclidean distance. Given that different measures stood out for time course and cancer data evaluations, their choice should be specific to each scenario. To evaluate measures on time-course data, we preprocessed and compiled 17 data sets from the microarray literature in a benchmark along with a new methodology, called Intrinsic Biological Separation Ability (IBSA). Both can be employed in future research to assess the effectiveness of new measures for gene time-course data.
机译:聚类分析通常是揭示基因表达微阵列数据信息的第一步。除了选择聚类算法之外,选择合适的接近度(相似性或距离)对于获得令人满意的聚类结果也很重要。然而,迄今为止,还没有关于如何选择用于对微阵列数据进行聚类的邻近度度量的综合指南。皮尔逊(Pearson)是最常用的接近度度量,而其他度量的特性仍待探索。在本文中,我们通过评估来自时间过程和癌症实验的52个数据集中的16种邻近测量的性能,研究了微阵列数据聚类的邻近测量的选择。我们的结果支持在基因表达文献中很少采用的措施可以提供比通常采用的措施更好的结果,例如Pearson,Spearman和Euclidean距离。鉴于在时程和癌症数据评估中采用了不同的措施,因此应针对每种情况选择特定的措施。为了评估时程数据的度量,我们对微阵列文献中的17个数据集进行了预处理和编译,并以一种称为内在生物分离能力(IBSA)的新方法作为基准。两者都可以在未来的研究中用于评估基因时间过程数据的新措施的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号