首页> 外文期刊>Progress in Natural Science >Selections of data preprocessing methods and similarity metrics for gene cluster analysis
【24h】

Selections of data preprocessing methods and similarity metrics for gene cluster analysis

机译:选择用于基因簇分析的数据预处理方法和相似性指标

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Clustering is one of the major exploratory techniques for gene expression data analysis. Only with suitable similarity metrics and when datasets are properly preprocessed, can results of high quality be obtained in cluster analysis. In this study, gene expression datasets with external evaluation criteria were preprocessed as normalization by line, normalization by column or logarithm transformation by base-2, and were subsequently clustered by hierarchical clustering, k -means clustering and self-organizing maps (SOMs) with Pearson correlation coefficient or Euclidean distance as similarity metric. Finally, the quality of clusters was evaluated by adjusted Rand index. The results illustrate that k -means clustering and SOMs have distinct advantagesover hierarchical clustering in gene clustering, and SOMs are a bit better than k -means when randomly initialized. It also shows that hierarchical clustering prefers Pearson correlation coefficient as similarity metric and dataset normalized by line. Meanwhile, k -means clustering and SOMs can produce better clusters with Euclidean distance and logarithm transformed datasets. These results will afford valuable reference to the implementation of gene expression cluster analysis.
机译:聚类是基因表达数据分析的主要探索性技术之一。只有使用合适的相似性度量标准并且对数据集进行适当的预处理,才能在聚类分析中获得高质量的结果。在这项研究中,具有外部评估标准的基因表达数据集按行进行归一化,按列2进行对数归一化或对数转换进行归一化,然后通过层次聚类,k均值聚类和自组织图(SOM)进行聚类。皮尔逊相关系数或欧式距离作为相似性度量。最后,通过调整的兰德指数评估簇的质量。结果表明,k -means聚类和SOM在基因聚类中具有优于层次聚类的优势,并且随机初始化时SOM比k -means更好。这也表明层次聚类倾向于将Pearson相关系数作为相似性度量和按行归一化的数据集。同时,k均值聚类和SOM可以使用欧氏距离和对数转换的数据集生成更好的聚类。这些结果将为基因表达聚类分析的实施提供有价值的参考。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号