...
首页> 外文期刊>International journal of machine learning and cybernetics >Discovery of bidirectional contiguous column coherent bicluster in time-series gene expression data
【24h】

Discovery of bidirectional contiguous column coherent bicluster in time-series gene expression data

机译:在时序基因表达数据中发现双向连续列相干双簇

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

AbstractThe application of high-throughput microarray has led to massive gene expression data, urging effective methodology for analysis. Biclustering comes out and serves as a useful tool, performing simultaneous clustering on rows and columns to find subsets of coherently expressed genes and conditions. Specially, in analysis of time–series gene expression data, it is meaningful to restrict biclusters to contiguous time points concerning coherent evolutions. In this paper, BCCC-Bicluster is proposed as an extension of CCC-Bicluster. An exact algorithm based on frequent sequential mining is proposed to find all maximal BCCC-Biclusters. The newly defined Frequent-Infrequent Tree-Array (FITA) is constructed to speed up the traversal process, with useful strategies originating from Apriori property to avoid redundant work. To make it more efficient, the bitwise operation XOR is applied to capture identical or opposite contiguous patterns between two rows. The algorithm is tested in simulated data, yeast microarray data and human microarray data. The experimental results show the proposed algorithm had better performance on the ability to recover the planted biclusters in the synthetic data than CCC-Biclusters and outperformed the one without FITA in speed and scalability. In the enrichment analysis, BCCC-Biclusters are proven to find more significant GO terms involved in biological processes than other three kinds of up-to-date biclusters.
机译: Abstract 高通量微阵列的应用催生了大量基因表达数据,敦促有效的分析方法。双聚类技术(biclustering)问世,它是一种有用的工具,可以对行和列进行同时聚类,以找到相干表达的基因和条件的子集。特别地,在分析时序基因表达数据时,将双聚簇限制在有关连贯进化的连续时间点上是有意义的。本文提出了BCCC-Bicluster作为CCC-Bicluster的扩展。提出了一种基于频繁顺序挖掘的精确算法,以找到所有最大的BCCC-Bicluster。新定义的频繁不频繁树数组(FITA)构造为加快遍历过程,其有用的策略源自Apriori属性,以避免重复工作。为了使其更有效,应用了按位运算XOR来捕获两行之间相同或相反的连续模式。该算法已在模拟数据,酵母微阵列数据和人类微阵列数据中进行了测试。实验结果表明,与CCC-Bicluster相比,该算法在恢复合成数据中的双峰方面具有更好的性能,在速度和可扩展性方面均优于无FITA的算法。在富集分析中,事实证明,BCCC-双簇比其他三种最新的双簇更能发现与生物过程有关的GO术语。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号