首页> 外文期刊>Algorithms for Molecular Biology >DeBi: Discovering Differentially Expressed Biclusters using a Frequent Itemset Approach
【24h】

DeBi: Discovering Differentially Expressed Biclusters using a Frequent Itemset Approach

机译:DeBi:使用频繁项集方法发现差异表达的Biclusters

获取原文
获取外文期刊封面目录资料

摘要

Background The analysis of massive high throughput data via clustering algorithms is very important for elucidating gene functions in biological systems. However, traditional clustering methods have several drawbacks. Biclustering overcomes these limitations by grouping genes and samples simultaneously. It discovers subsets of genes that are co-expressed in certain samples. Recent studies showed that biclustering has a great potential in detecting marker genes that are associated with certain tissues or diseases. Several biclustering algorithms have been proposed. However, it is still a challenge to find biclusters that are significant based on biological validation measures. Besides that, there is a need for a biclustering algorithm that is capable of analyzing very large datasets in reasonable time. Results Here we present a fast biclustering algorithm called DeBi (Differentially Expressed BIclusters). The algorithm is based on a well known data mining approach called frequent itemset. It discovers maximum size homogeneous biclusters in which each gene is strongly associated with a subset of samples. We evaluate the performance of DeBi on a yeast dataset, on synthetic datasets and on human datasets. Conclusions We demonstrate that the DeBi algorithm provides functionally more coherent gene sets compared to standard clustering or biclustering algorithms using biological validation measures such as Gene Ontology term and Transcription Factor Binding Site enrichment. We show that DeBi is a computationally efficient and powerful tool in analyzing large datasets. The method is also applicable on multiple gene expression datasets coming from different labs or platforms.
机译:背景技术通过聚类算法分析大量的高通量数据对于阐明生物系统中的基因功能非常重要。但是,传统的聚类方法有几个缺点。通过同时对基因和样本进行分组,聚类克服了这些限制。它发现在某些样品中共表达的基因子集。最近的研究表明,双聚类技术在检测与某些组织或疾病相关的标记基因方面具有巨大潜力。已经提出了几种双簇算法。但是,基于生物学验证方法来找到重要的双聚类仍然是一个挑战。除此之外,还需要一种能够在合理的时间内分析非常大的数据集的双重聚类算法。结果在这里,我们提出了一种称为DeBi(差异表达的BIclusters)的快速双聚类算法。该算法基于一种称为频繁项集的众所周知的数据挖掘方法。它发现最大尺寸的均质双聚簇,其中每个基因与样品的一个子集紧密相关。我们评估DeBi在酵母数据集,合成数据集和人类数据集上的性能。结论我们证明,与使用生物学验证方法(例如基因本体论术语和转录因子结合位点富集)的标准聚类或双聚类算法相比,DeBi算法在功能上提供了更一致的基因集。我们证明DeBi是分析大型数据集的一种计算有效且功能强大的工具。该方法还适用于来自不同实验室或平台的多个基因表达数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号