首页> 外文期刊>Journal of Bioinformatics and Computational Biology >DYNAMIC MODEL-BASED CLUSTERING FOR TIME-COURSE GENE EXPRESSION DATA
【24h】

DYNAMIC MODEL-BASED CLUSTERING FOR TIME-COURSE GENE EXPRESSION DATA

机译:基于动态模型的时间课程基因表达数据集群

获取原文
获取原文并翻译 | 示例
           

摘要

Microarray technology has prodviced a huge body of time-course gene expression data. Such gene expression data has proved useful in genomic disease diagnosis and genomic drug design. The challenge is how to uncover useful information in such data. Cluster analysis has played an important role in analyzing gene expression data. Many distance/correlation- and static model-based clustering techniques have been applied to time-course expression data. However, these techniques are unable to account for the dynamics of such data. It is the dynamics that characterize the data and that should be considered in cluster analysis so as to obtain high quality clustering. This paper proposes a dynamic model-based clustering method for time-course gene expression data. The proposed method regards a time-course gene expression dataset as a set of time series, generated by a number of stochastic processes. Each stochastic process defines a cluster and is described by an autoregressive model. A relocation-iteration algorithm is proposed to identity the model parameters and posterior probabilities are employed to assign each gene to an appropriate cluster. A bootstrapping method and an average adjusted Rand index (AARI) are employed to measure the quality of clustering. Computational experiments are performed on a synthetic and three real time-course gene expression datasets to investigate the proposed method. The results show that our method allows the better quality clustering than other clustering methods (e.g. k-means) for time-course gene expression data, and thus it is a useful and powerful tool for analyzing time-course gene expression data.
机译:微阵列技术已经提出了一个巨大的时间课程基因表达数据。这些基因表达数据已证明可用于基因组疾病诊断和基因组药物设计。挑战是如何在这些数据中揭示有用信息。聚类分析在分析基因表达数据方面发挥了重要作用。已经应用于时间课程表达数据的许多距离/相关和静态模型的聚类技术。但是,这些技术无法解释此类数据的动态。它是表征数据的动态,应该在集群分析中考虑,以获得高质量的聚类。本文提出了一种动态模型基于时间课程基因表达数据的聚类方法。所提出的方法将时间课程基因表达数据集视为一组时间序列,由多个随机过程产生。每个随机过程定义群集,并由自回归模型描述。提出重定位迭代算法对标识模型参数和后验概率来将每个基因分配给适当的群集。引导方法和平均调整的rand索引(AARI)用于测量聚类的质量。对合成和三次实时基因表达数据集进行计算实验,以研究提出的方法。结果表明,我们的方法允许比其他聚类方法(例如K-means)进行更好的质量聚类,用于时间过程基因表达数据,因此它是分析时间疗程基因表达数据的有用而强大的工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号