首页> 外文期刊>Bioinformatics >Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters
【24h】

Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters

机译:时间基因表达谱分析:通过模拟退火进行聚类并确定最佳聚类数

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: Cluster analysis of genome-wide expression data from DNA microarray hybridization studies has proved to be a useful tool for identifying biologically relevant groupings of genes and samples. In the present paper, we focus on several important issues related to clustering algorithms that have not yet been fully studied. Results: We describe a simple and robust algorithm for the clustering of temporal gene expression profiles that is based on the simulated annealing procedure. In general, this algorithm guarantees to eventually find the globally optimal distribution of genes over clusters. We introduce an iterative scheme that serves to evaluate quantitatively the optimal number of clusters for each specific data set. The scheme is based on standard approaches used in regular statistical tests. The basic idea is to organize the search of the optimal number of clusters simultaneously with the optimization of the distribution of genes over clusters. The efficiency of the proposed algorithm has been evaluated by means of a reverse engineering experiment, that is, a situation in which the correct distribution of genes over clusters is known a priori. The employment of this statistically rigorous test has shown that our algorithm places greater than 90% genes into correct clusters. Finally, the algorithm has been tested on real gene expression data (expression changes during yeast cell cycle) for which the fundamental patterns of gene expression and the assignment of genes to clusters are well understood from numerous previous studies. Availability: The source code of the program implementing the algorithm is available upon request from the authors. Contact: alex_-lukashin@biogen.com
机译:动机:来自DNA微阵列杂交研究的全基因组表达数据的聚类分析已被证明是鉴定基因和样品生物学相关分组的有用工具。在本文中,我们关注与聚类算法相关的几个重要问题,这些问题尚未得到充分研究。结果:我们描述了一种简单而强大的算法,用于基于模拟退火程序的时态基因表达谱的聚类。通常,此算法可保证最终找到整个簇上基因的全局最优分布。我们介绍了一种迭代方案,该方案可用于量化评估每个特定数据集的最佳聚类数。该方案基于常规统计测试中使用的标准方法。基本思想是组织最佳簇数的搜索,同时优化簇上基因的分布。已经通过逆向工程实验评估了所提出算法的效率,也就是先验地知道了基因在簇上的正确分布的情况。使用此统计严格的测试表明,我们的算法将90%以上的基因置于正确的簇中。最后,该算法已在真实的基因表达数据(酵母细胞周期中的表达变化)上进行了测试,对于该基因表达的基本模式和基因对簇的分配,已有大量研究得到了很好的理解。可用性:可以根据作者的要求提供实现该算法的程序的源代码。联系方式:alex_-lukashin@biogen.com

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号