首页> 外文期刊>IEEE transactions on evolutionary computation >An evolutionary clustering algorithm for gene expression microarray data analysis
【24h】

An evolutionary clustering algorithm for gene expression microarray data analysis

机译:基因表达微阵列数据分析的进化聚类算法

获取原文
获取原文并翻译 | 示例

摘要

Clustering is concerned with the discovery of interesting groupings of records in a database. Many algorithms have been developed to tackle clustering problems in a variety of application domains. In particular, some of them have been used in bioinformatics research to uncover inherent clusters in gene expression microarray data. In this paper, we show how some popular clustering algorithms have been used for this purpose. Based on experiments using simulated and real data, we also show that the performance of these algorithms can be further improved. For more effective clustering of gene expression microarray data, which is typically characterized by a lot of noise, we propose a novel evolutionary algorithm called evolutionary clustering (EvoCluster). EvoCluster encodes an entire cluster grouping in a chromosome so that each gene in the chromosome encodes one cluster. Based on such encoding scheme, it makes use of a set of reproduction operators to facilitate the exchange of grouping information between chromosomes. The fitness function that the EvoCluster adopts is able to differentiate between how relevant a feature value is in determining a particular cluster grouping. As such, instead of just local pairwise distances, it also takes into consideration how clusters are arranged globally. Unlike many popular clustering algorithms, EvoCluster does not require the number of clusters to be decided in advance. Also, patterns hidden in each cluster can be explicitly revealed and presented for easy interpretation even by casual users. For performance evaluation, we have tested EvoCluster using both simulated and real data. Experimental results show that it can be very effective and robust even in the presence of noise and missing values. Also, when correlating the gene expression microarray data with DNA sequences, we were able to uncover significant biological binding sites (both previously known and unknown) in each cluster discovered by EvoCluster.
机译:群集与发现数据库中有趣的记录分组有关。已经开发出许多算法来解决各种应用领域中的聚类问题。特别是,其中一些已用于生物信息学研究中,以发现基因表达微阵列数据中的固有簇。在本文中,我们展示了如何将一些流行的聚类算法用于此目的。基于使用模拟和真实数据进行的实验,我们还表明可以进一步提高这些算法的性能。为了更有效地聚类基因表达微阵列数据(通常以很多杂讯为特征),我们提出了一种新颖的进化算法,称为进化聚类(EvoCluster)。 EvoCluster编码染色体中的整个簇,从而使染色体中的每个基因都编码一个簇。基于这种编码方案,它利用一组复制操作符来促进染色体之间分组信息的交换。 EvoCluster采用的适应度函数能够区分特征值与确定特定聚类分组的相关程度。这样,它不仅考虑局部成对的距离,还考虑了群集如何全局布置。与许多流行的聚类算法不同,EvoCluster不需要预先确定聚类的数量。此外,即使是临时用户,也可以显式显示和显示隐藏在每个群集中的模式,以便于解释。为了评估性能,我们已经使用模拟数据和实际数据测试了EvoCluster。实验结果表明,即使在存在噪声和缺失值的情况下,它也可以非常有效和强大。同样,当将基因表达微阵列数据与DNA序列关联时,我们能够在EvoCluster发现的每个簇中发现重要的生物结合位点(既为已知的又为未知的)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号