...
首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >Clustering of temporal gene expression data by regularized spline regression and an energy based similarity measure
【24h】

Clustering of temporal gene expression data by regularized spline regression and an energy based similarity measure

机译:通过正则样条回归和基于能量的相似性度量对时间基因表达数据进行聚类

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Clustering analysis of temporal gene expression data is widely used to study dynamic biological systems, such as identifying sets of genes that are regulated by the same mechanism. However, most temporal gene expression data often contain noise, missing data points, and non-uniformly sampled time points, which imposes challenges for traditional clustering methods of extracting meaningful information. In this paper, we introduce an improved clustering approach based on the regularized spline regression and an energy based similarity measure. The proposed approach models each gene expression profile as a B-spline expansion, for which the spline coefficients are estimated by regularized least squares scheme on the observed data. To compensate the inadequate information from noisy and short gene expression data, we use its correlated genes as the test set to choose the optimal number of basis and the regularization parameter. We show that this treatment can help to avoid over-fitting. After fitting the continuous representations of gene expression profiles, we use an energy based similarity measure for clustering. The energy based measure can include the temporal information and relative changes of the time series using the first and second derivatives of the time series. We demonstrate that our method is robust to noise and can produce meaningful clustering results.
机译:时态基因表达数据的聚类分析被广泛用于研究动态生物学系统,例如识别受同一机制调控的基因集。但是,大多数时间基因表达数据通常包含噪声,缺失的数据点和非均匀采样的时间点,这对提取有意义信息的传统聚类方法提出了挑战。在本文中,我们介绍了一种基于正则样条回归和基于能量的相似性度量的改进聚类方法。提出的方法将每个基因表达谱建模为B样条扩展,对于样条系数,通过对观察到的数据进行正则化最小二乘估计来估计。为了补偿来自嘈杂和短基因表达数据的不足信息,我们使用其相关基因作为测试集来选择最佳的基数和正则化参数。我们表明这种治疗方法可以帮助避免过度拟合。在拟合基因表达谱的连续表示后,我们使用基于能量的相似性度量进行聚类。基于能量的度量可以包括时间信息和使用时间序列的一阶和二阶导数的时间序列的相对变化。我们证明了我们的方法对噪声是鲁棒的,并且可以产生有意义的聚类结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号