首页> 外文会议>Supercomputing, 2005. Proceedings of the ACM/IEEE SC 2005 Conference >Genome-Scale Computational Approaches to Memory-Intensive Applications in Systems Biology
【24h】

Genome-Scale Computational Approaches to Memory-Intensive Applications in Systems Biology

机译:基因组规模的计算方法,用于系统生物学中的内存密集型应用

获取原文

摘要

Graph-theoretical approaches to biological network analysis have proven to be effective for small networks but are computationally infeasible for comprehensive genome-scale systems-level elucidation of these networks. The difficulty lies in the NP-hard nature of many global systems biology problems that, in practice, translates to exponential (or worse) run times for finding exact optimal solutions. Moreover, these problems, especially those of an enumerative flavor, are often memory-intensive and must share very large sets of data effectively across many processors. For example, the enumeration of maximal cliques - a core component in gene expression networks analysis, cis regulatory motif finding, and the study of quantitative trait loci for high-throughput molecular phenotypes can result in as many as 3^n/3 maximal cliques for a graph with n vertices. Memory requirements to store those cliques reach terabyte scales even on modest-sized genomes. Emerging hardware architectures with ultra-large globally addressable memory such as the SGI Altix and Cray X1 seem to be well suited for addressing these types of data-intensive problems in systems biology. This paper presents a novel framework that provides exact, parallel and scalable solutions to various graph-theoretical approaches to genome-scale elucidation of biological networks. This framework takes advantage of these large-memory architectures by creating globally addressable bitmap memory indices with potentially high compression rates, fast bitwise-logical operations, and reduced search space. Augmented with recent theoretical advancements based on fixed-parameter tractability, this framework produces computationally feasible performance for genome-scale combinatorial problems of systems biology.
机译:基于图论的生物网络分析方法已被证明对小型网络有效,但对于这些网络的全面基因组规模的系统级解释在计算上是不可行的。困难在于许多全球系统生物学问题的NP难性,在实践中,这些问题转化为指数(或更差的)运行时间以寻找精确的最佳解决方案。而且,这些问题,尤其是枚举问题,通常占用大量内存,并且必须在许多处理器之间有效地共享非常大的数据集。例如,最大集团的枚举-基因表达网络分析,顺式调控基序发现以及高通量分子表型的数量性状基因座研究的核心组成部分,可导致多达3 ^ n / 3的最大集团具有n个顶点的图。即使在中等大小的基因组上,存储这些团的内存需求也达到了TB级。具有超大全局可寻址内存的新兴硬件体系结构,例如SGI Altix和Cray X1,似乎非常适合解决系统生物学中的这类数据密集型问题。本文提出了一个新颖的框架,该框架为各种图论方法提供了精确,并行和可扩展的解决方案,以阐明生物网络的基因组规模。该框架通过创建全局可寻址的位图内存索引来利用这些大内存体系结构,这些索引可能具有较高的压缩率,快速的按位逻辑运算和减少的搜索空间。借助基于固定参数可扩展性的最新理论进展,此框架为系统生物学的基因组规模组合问题提供了计算上可行的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号