首页> 美国卫生研究院文献>other >Biclustering Methods: Biological Relevance and Application in Gene Expression Analysis
【2h】

Biclustering Methods: Biological Relevance and Application in Gene Expression Analysis

机译:分类方法:生物学意义及其在基因表达分析中的应用

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

DNA microarray technologies are used extensively to profile the expression levels of thousands of genes under various conditions, yielding extremely large data-matrices. Thus, analyzing this information and extracting biologically relevant knowledge becomes a considerable challenge. A classical approach for tackling this challenge is to use clustering (also known as one-way clustering) methods where genes (or respectively samples) are grouped together based on the similarity of their expression profiles across the set of all samples (or respectively genes). An alternative approach is to develop biclustering methods to identify local patterns in the data. These methods extract subgroups of genes that are co-expressed across only a subset of samples and may feature important biological or medical implications. In this study we evaluate 13 biclustering and 2 clustering (k-means and hierarchical) methods. We use several approaches to compare their performance on two real gene expression data sets. For this purpose we apply four evaluation measures in our analysis: (1) we examine how well the considered (bi)clustering methods differentiate various sample types; (2) we evaluate how well the groups of genes discovered by the (bi)clustering methods are annotated with similar Gene Ontology categories; (3) we evaluate the capability of the methods to differentiate genes that are known to be specific to the particular sample types we study and (4) we compare the running time of the algorithms. In the end, we conclude that as long as the samples are well defined and annotated, the contamination of the samples is limited, and the samples are well replicated, biclustering methods such as Plaid and SAMBA are useful for discovering relevant subsets of genes and samples.
机译:DNA微阵列技术被广泛用于分析各种条件下成千上万个基因的表达水平,从而产生非常大的数据矩阵。因此,分析该信息并提取生物学相关知识成为一项巨大的挑战。解决此难题的经典方法是使用聚类(也称为单向聚类)方法,其中基于基因(或各个样本)在所有样本(或各个基因)集中的表达谱相似性将它们分组在一起。另一种方法是开发双聚类方法,以识别数据中的局部模式。这些方法可提取仅在一部分样本中共表达的基因亚组,并且可能具有重要的生物学或医学意义。在这项研究中,我们评估了13种二类聚类和2种聚类(k均值和分层)方法。我们使用几种方法来比较它们在两个真实基因表达数据集上的表现。为此,我们在分析中采用了四种评估方法:(1)我们检查了考虑的(bi)聚类方法对各种样本类型的区分程度; (2)我们评估通过(bi)聚类方法发现的基因组在相似的基因本体论类别中的注释情况; (3)我们评估了该方法区分已知特定于我们研究的特定样本类型的基因的能力,以及(4)比较了算法的运行时间。最后,我们得出的结论是,只要对样品进行了明确的定义和注释,样品的污染就会受到限制,并且样品能够得到很好的复制,诸如Plaid和SAMBA的双聚类分析方法可用于发现基因和样品的相关子集。 。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号