首页> 外文期刊>Biotechnology Progress >An adaptive strategy for single- and multi-cluster gene assignment
【24h】

An adaptive strategy for single- and multi-cluster gene assignment

机译:单集群和多集群基因分配的自适应策略

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Strict assignment of genes to one class, dimensionality reduction, a priori specification of the number of classes, the need for a training set, nonunique solution, and complex learning mechanisms are some of the inadequacies of current clustering algorithms. Existing algorithms cluster genes on the basis of high positive correlations between their expression patterns. However, genes with strong negative correlations can also have similar functions and are most likely to have a role in the same pathways. To address some of these issues, we propose the adaptive centroid algorithm (ACA), which employs an analysis of variance (ANOVA)-based performance criterion. The ACA also uses Euclidian distances, the center-of-mass principle for heterogeneously distributed mass elements, and the given data set to give unique solutions. The proposed approach involves three stages. In the first stage a two-way ANOVA of the gene expression matrix is performed. The two factors in the ANOVA are gene expression and experimental condition. The residual mean squared error (MSE) from the ANOVA is used as a performance criterion in the ACA. Finally, correlated clusters are found based on the Pearson correlation coefficients. To validate the proposed approach, a two-way ANOVA is again performed on the discovered clusters. The results from this last step indicate that MSEs of the clusters are significantly lower compared to that of the fibroblast-serum gene expression matrix. The ACA is employed in this study for single- as well as multi-cluster gene assignments.
机译:基因严格分配给一个类别,降维,类别数量的先验规范,对训练集的需求,非唯一解以及复杂的学习机制是当前聚类算法的不足之处。现有算法基于基因表达模式之间的高度正相关来聚类基因。但是,具有强负相关性的基因也可以具有相似的功能,并且最有可能在相同的途径中起作用。为了解决其中一些问题,我们提出了自适应质心算法(ACA),该算法采用了基于方差分析(ANOVA)的性能标准。 ACA还使用Euclidian距离,异质分布质量元素的质心原理以及给定的数据集来提供独特的解决方案。拟议的方法涉及三个阶段。在第一阶段,执行基因表达矩阵的双向ANOVA。方差分析中的两个因素是基因表达和实验条件。来自ANOVA的残差均方误差(MSE)用作ACA中的性能标准。最后,基于皮尔森相关系数找到相关的聚类。为了验证所提出的方法,再次对发现的集群执行了双向方差分析。最后一步的结果表明,与成纤维细胞血清基因表达矩阵相比,簇的MSE明显更低。在本研究中,ACA用于单簇和多簇基因分配。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号