首页> 外文期刊>Briefings in bioinformatics >A ccomparative analysis of biclustering algorithms for gene expression dat
【24h】

A ccomparative analysis of biclustering algorithms for gene expression dat

机译:基因表达数据双聚类算法的比较分析

获取原文
获取原文并翻译 | 示例
           

摘要

The need to analyze high-dimension biological data is driving the development of new data mining methods. Biclustering algorithms have been successfully applied to gene expression data to discover local patterns, in which a subset of genes exhibit similar expression levels over a subset of conditions. However, it is not clear which algorithms are best suited for this task. Many algorithms have been published in the past decade, most of which have been compared only to a small number of algorithms. Surveys and comparisons exist in the literature, but because of the large number and variety of biclustering algorithms, they are quickly outdated. In this article we partially address this problem of evaluating the strengths and weaknesses of existing biclustering methods. We used the BiBench package to compare 12 algorithms, many of which were recently published or have not been extensively studied. The algorithms were tested on a suite of synthetic data sets to measure their performance on data with varying conditions, such as different bicluster models, varying noise, varying numbers of biclusters and overlapping biclusters. The algorithms were also tested on eight large gene expression data sets obtained from the Gene Expression Omnibus. Gene Ontology enrichment analysis was performed on the resulting biclusters, and the best enrichment terms are reported.Our analyses show that the biclustering method and its parameters should be selected based on the desired model, whether that model allows overlapping biclusters, and its robustness to noise. In addition, we observe that the biclustering algorithms capable of finding more than one model are more successful at capturing biologically relevant clusters.
机译:分析高维生物数据的需求正在推动新数据挖掘方法的发展。双聚类算法已成功应用于基因表达数据,以发现局部模式,其中基因的一个子集在一定条件下表现出相似的表达水平。但是,尚不清楚哪种算法最适合此任务。在过去的十年中,已经发布了许多算法,其中大多数仅与少数算法进行了比较。文献中进行了调查和比较,但是由于有大量的双聚类算法,它们很快就过时了。在本文中,我们部分解决了评估现有双聚类方法的优缺点的问题。我们使用BiBench软件包比较了12种算法,其中许多算法是最近发布的或尚未进行广泛研究的。该算法在一组综合数据集上进行了测试,以测量其在具有不同条件的数据上的性能,例如不同的bicluster模型,变化的噪声,变化的bicluster数量和重叠的bicluster。还对从Gene Expression Omnibus获得的八个大型基因表达数据集测试了算法。对生成的双簇进行基因本体富集分析,并报告了最佳的富集术语。我们的分析表明,应根据所需模型选择双簇方法及其参数,该模型是否允许重叠的双簇及其对噪声的鲁棒性。此外,我们观察到能够找到多个模型的双聚类算法在捕获生物学上相关的簇方面更为成功。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号