...
首页> 外文期刊>BMC Genomics >Transcription network construction for large-scale microarray datasets using a high-performance computing approach
【24h】

Transcription network construction for large-scale microarray datasets using a high-performance computing approach

机译:使用高性能计算方法构建大规模微阵列数据集的转录网络

获取原文
   

获取外文期刊封面封底 >>

       

摘要

BackgroundThe advance in high-throughput genomic technologies including microarrays has demonstrated the potential of generating a tremendous amount of gene expression data for the entire genome. Deciphering transcriptional networks that convey information on intracluster correlations and intercluster connections of genes is a crucial analysis task in the post-sequence era. Most of the existing analysis methods for genome-wide gene expression profiles consist of several steps that often require human involvement based on experiential knowledge that is generally difficult to acquire and formalize. Moreover, large-scale datasets typically incur prohibitively expensive computation overhead and thus result in a long experiment-analysis research cycle.ResultsWe propose a parallel computation-based random matrix theory approach to analyze the cross correlations of gene expression data in an entirely automatic and objective manner to eliminate the ambiguities and subjectivity inherent to human decisions. We apply the proposed approach to the publicly available human liver cancer data and yeast cycle data, and generate transcriptional networks that illustrate interacting functional modules. The experimental results conform accurately to those published in previous literatures.ConclusionsThe correlations calculated from experimental measurements typically contain both “genuine” and “random” components. In the proposed approach, we remove the “random” component by testing the statistics of the eigenvalues of the correlation matrix against a “null hypothesis” — a truly random correlation matrix obtained from mutually uncorrelated expression data series. Our investigation into the components of deviating eigenvectors after varimax orthogonal rotation reveals distinct functional modules. The utilization of high performance computing resources including ScaLAPACK package, supercomputer and Linux PC cluster in our implementations and experiments significantly reduces the amount of computation time that is otherwise needed on a single workstation. More importantly, the large distributed shared memory and parallel computing power allow us to process genomic datasets of enormous sizes.
机译:背景技术包括微阵列在内的高通量基因组技术的进步证明了在整个基因组中生成大量基因表达数据的潜力。在序列后时代,解密传递有关簇内相关性和基因簇间连接信息的转录网络是一项至关重要的分析任务。现有的大多数用于全基因组基因表达谱的分析方法都包含几个步骤,这些步骤通常需要基于通常难以获取和形式化的经验知识,由人类参与。此外,大规模数据集通常会产生过高的计算开销,从而导致较长的实验分析研究周期。结果我们提出了一种基于并行计算的随机矩阵理论方法,可以全自动,客观地分析基因表达数据的互相关性消除人类决策固有的歧义和主观性的方式。我们将建议的方法应用于可公开获得的人类肝癌数据和酵母周期数据,并生成说明相互作用功能模块的转录网络。实验结果与先前文献中的结果准确一致。结论从实验测量值计算出的相关性通常包含“真实”和“随机”成分。在提出的方法中,我们通过针对“零假设”(从互不相关的表达数据序列获得的真正随机的相关矩阵)测试相关矩阵的特征值的统计信息,删除“随机”成分。在方差最大正交旋转之后,我们对偏差特征向量的成分的研究揭示了不同的功能模块。在我们的实施和实验中,利用包括ScaLAPACK软件包,超级计算机和Linux PC群集在内的高性能计算资源,显着减少了单工作站所需的计算时间。更重要的是,巨大的分布式共享内存和并行计算能力使我们能够处理巨大的基因组数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号