...
首页> 外文期刊>Journal of Bioinformatics and Computational Biology >CLUSTERING AND RE-CLUSTERING FOR PATTERN DISCOVERY IN GENE EXPRESSION DATA
【24h】

CLUSTERING AND RE-CLUSTERING FOR PATTERN DISCOVERY IN GENE EXPRESSION DATA

机译:基因表达数据中模式发现的聚类和重新聚类

获取原文
获取原文并翻译 | 示例

摘要

The combined interpretation of gene expression data and gene sequences is important for the investigation of the intricate relationships of gene expression at the transcription level. The expression data produced by microarray hybridization experiments can lead to the identification of clusters of co-expressed genes that are likely co-regulated by the same regulatory mechanisms. By analyzing the promoter regions of co-expressed genes, the common regulatory patterns characterized by transcription factor binding sites can be revealed. Many clustering algorithms have been used to uncover inherent clusters in gene expression data. In this paper, based on experiments using simulated and real data, we show that the performance of these algorithms could be further improved. For the clustering of expression data typically characterized by a lot of noise, we propose to use a two-phase clustering algorithm consisting of an initial clustering phase and a second re-clustering phase. The proposed algorithm has several desirable features: (i) it utilizes both local and global information by computing both a "local" pairwise distance between two gene expression profiles in Phase 1 and a "global" probabilistic measure of interestingness of cluster patterns in Phase 2, (ii) it distinguishes between relevant and irrelevant expression values when performing re-clustering, and (iii) it makes explicit the patterns discovered in each cluster for possible interpretations. Experimental results show that the proposed algorithm can be an effective algorithm for discovering clusters in the presence of very noisy data. The patterns that are discovered in each cluster are found to be meaningful and statistically significant, and cannot otherwise be easily discovered. Based on these discovered patterns, genes co-expressed under the same experimental conditions and range of expression levels have been identified and evaluated. When identifying regulatory patterns at the promoter regions of the co-expressed genes, we also discovered well-known transcription factor binding sites in them. These binding sites can provide explanations for the co-expressed patterns.
机译:基因表达数据和基因序列的组合解释对于研究转录水平上基因表达的复杂关系很重要。由微阵列杂交实验产生的表达数据可以导致鉴定可能由相同调控机制共同调控的共表达基因的簇。通过分析共表达基因的启动子区域,可以揭示以转录因子结合位点为特征的常见调控模式。许多聚类算法已被用来发现基因表达数据中的固有聚类。在基于模拟和真实数据的实验中,我们证明了这些算法的性能可以进一步提高。对于通常具有很多噪声特征的表达式数据的聚类,我们建议使用由初始聚类阶段和第二个重新聚类阶段组成的两阶段聚类算法。所提出的算法具有几个理想的功能:(i)通过计算阶段1中两个基因表达谱之间的“局部”成对距离和阶段2中群集模式的趣味性的“全局”概率度量,利用本地和全局信息,(ii)在执行重新聚类时区分相关和不相关的表达值,并且(iii)明确说明在每个聚类中发现的模式以进行可能的解释。实验结果表明,所提出的算法可以有效地解决存在大量噪声数据时的聚类问题。在每个群集中发现的模式被认为是有意义的,并且具有统计意义,因此很难通过其他方式发现。基于这些发现的模式,已经鉴定并评估了在相同实验条件和表达水平范围内共表达的基因。当确定共表达基因的启动子区域的调控模式时,我们还发现了其中众所周知的转录因子结合位点。这些结合位点可以为共表达模式提供解释。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号