Clustering of temporal gene expression data by regularized spline regression and an energy based similarity measure

Zhang W.-F.; Liu C.-C.; Yan H.

首页> 外文期刊>Pattern Recognition: The Journal of the Pattern Recognition Society >Clustering of temporal gene expression data by regularized spline regression and an energy based similarity measure

【24h】

Clustering of temporal gene expression data by regularized spline regression and an energy based similarity measure

机译：通过正则样条回归和基于能量的相似性度量对时间基因表达数据进行聚类

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering analysis of temporal gene expression data is widely used to study dynamic biological systems, such as identifying sets of genes that are regulated by the same mechanism. However, most temporal gene expression data often contain noise, missing data points, and non-uniformly sampled time points, which imposes challenges for traditional clustering methods of extracting meaningful information. In this paper, we introduce an improved clustering approach based on the regularized spline regression and an energy based similarity measure. The proposed approach models each gene expression profile as a B-spline expansion, for which the spline coefficients are estimated by regularized least squares scheme on the observed data. To compensate the inadequate information from noisy and short gene expression data, we use its correlated genes as the test set to choose the optimal number of basis and the regularization parameter. We show that this treatment can help to avoid over-fitting. After fitting the continuous representations of gene expression profiles, we use an energy based similarity measure for clustering. The energy based measure can include the temporal information and relative changes of the time series using the first and second derivatives of the time series. We demonstrate that our method is robust to noise and can produce meaningful clustering results.

机译：时态基因表达数据的聚类分析被广泛用于研究动态生物学系统，例如识别受同一机制调控的基因集。但是，大多数时间基因表达数据通常包含噪声，缺失的数据点和非均匀采样的时间点，这对提取有意义信息的传统聚类方法提出了挑战。在本文中，我们介绍了一种基于正则样条回归和基于能量的相似性度量的改进聚类方法。提出的方法将每个基因表达谱建模为B样条扩展，对于样条系数，通过对观察到的数据进行正则化最小二乘估计来估计。为了补偿来自嘈杂和短基因表达数据的不足信息，我们使用其相关基因作为测试集来选择最佳的基数和正则化参数。我们表明这种治疗方法可以帮助避免过度拟合。在拟合基因表达谱的连续表示后，我们使用基于能量的相似性度量进行聚类。基于能量的度量可以包括时间信息和使用时间序列的一阶和二阶导数的时间序列的相对变化。我们证明了我们的方法对噪声是鲁棒的，并且可以产生有意义的聚类结果。

著录项

来源
《Pattern Recognition: The Journal of the Pattern Recognition Society》 |2010年第12期|共8页
作者
Zhang W.-F.; Liu C.-C.; Yan H.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Clustering; Energy operator; Regularized regression; Spline model; Temporal gene expression data analysis;

机译：聚类;能量算子;正则回归;样条模型;时态基因表达数据分析;

相似文献

外文文献
中文文献
专利

1. Clustering of temporal gene expression data by regularized spline regression and an energy based similarity measure [J] . Zhang W.-F., Liu C.-C., Yan H. Pattern Recognition: The Journal of the Pattern Recognition Society . 2010,第12期

机译：通过正则样条回归和基于能量的相似性度量对时间基因表达数据进行聚类
2. A modified correlation coefficient based similarity measure for clustering time-course gene expression data [J] . Young Sook Son, Jangsun Baek Pattern recognition letters . 2008,第3期

机译：改进的基于相关系数的相似度度量用于聚类时程基因表达数据
3. Clustering of gene expression data using a local shape-based similarity measure [J] . Rajarajeswari Balasubramaniyan, Eyke Huellermeier, Nils Weskamp, Bioinformatics . 2005,第7期

机译：使用基于局部形状的相似性度量对基因表达数据进行聚类
4. Biclustering of Gene Expression Data Based on SimUI Semantic Similarity Measure [C] . Juan A. Nepomuceno, Alicia Troncoso, Isabel A. Nepomuceno-Chamorro, International conference on hybrid artificial intelligent systems . 2016

机译：基于SimUI语义相似度测度的基因表达数据分类
5. Using semantic similarity measures in the biomedical domain for computing functional similarity between genes based on gene ontology [D] . Khabiri, Elham 2007

机译：在生物医学领域中使用语义相似性度量基于基因本体计算基因之间的功能相似性
6. Estimating mutual information using B-spline functions – an improved similarity measure for analysing gene expression data [O] . Carsten O Daub, Ralf Steuer, Joachim Selbig, 2004

机译：使用B样条函数估计共有信息–一种改进的相似性度量用于分析基因表达数据
7. Association Rule Based Similarity Measures for the Clustering of Gene Expression Data [O] . Prerna Sethi 2010

机译：基于规则的基因表达数据簇的相似度量

Clustering of temporal gene expression data by regularized spline regression and an energy based similarity measure

摘要

著录项

相似文献

相关主题

期刊订阅