首页> 外文会议>IEEE International Symposium on Bioinformatics and Bioengineering >MotifNetwork: A Grid-enabled Workflow for High-throughput Domain Analysis of Biological Sequences: Implications for annotation and study of phylogeny, protein interactions, and intraspecies variation
【24h】

MotifNetwork: A Grid-enabled Workflow for High-throughput Domain Analysis of Biological Sequences: Implications for annotation and study of phylogeny, protein interactions, and intraspecies variation

机译:Motifnetwork:用于生物序列的高吞吐域分析的支持网格工作流程:对系统发育,蛋白质相互作用和内部变异的诠释和研究的影响

获取原文

摘要

Traditionally, bioinformatics has been organized around the concepts of genes and gene products, typically proteins. Proteins are represented as sequences of amino acids and are analyzed against each other by alignment and similarity of their amino acids. However proteins contain subsequences that define their activity and mode of regulation. These subsequences are referred to as "domains" and "motifs". For understanding many aspects of gene function, gene interaction, and gene and organism evolution, there is an advantage to focusing analysis on the domain/motif level rather than on the gene level. Such analysis is inherently highly computationally intensive because of the exponential growth of the protein databases and the combinatorial number of ways in which domains and motifs interact with each other. Here we report, by means of a biological example, on our efforts to build a user-friendly environment for facilitating such analysis. The name of this environment is the MotifNetwork. The MotifNetwork is an integration effort to build a suite of biologically oriented and grid-enabled workflows for high throughput domain analysis of protein sequences. The workflow orchestration and enactment is handled with Taverna. [Oinn, 2004] The supporting grid-enabling services used to wrap and invoke the computational applications are implemented with the Generic Service Toolkit (GST) [Kandaswamy, 2006]. The ultimate results of this environment are data products, organized as matrices, and visualization files suitable for quick analysis. Detailed descriptions of data products from a representative biological example are presented. Lastly, some preliminary performance data are displayed including use of the workflow to determine the domain architecture of all proteins in a complete genome (the honeybee). Extension to comprehensive analysis of SNP's in a genome is discussed. The MotifNetwork workflow is or will soon be available online through the RENCI Science Gateway at http://www.tgbioportal.org/.
机译:传统上,围绕基因和基因产物的概念组织了生物信息学,通常是蛋白质。蛋白质表示为氨基酸序列,并通过它们的氨基酸的对准和相似性彼此分析。然而,蛋白质含有定义其活动和调节方式的子序列。这些子序列被称为“域”和“图案”。为了了解基因函数,基因相互作用和基因和生物体的许多方面,对域/基序水平而不是基因水平的分析具有重要性。由于蛋白质数据库的指数增长以及域和图案彼此相互作用的组合数量,这种分析是具有高度计算密集型的。在这里,我们通过生物学示例报告我们为促进这种分析构建用户友好环境的努力。此环境的名称是Motifnetwork。 Motifnetwork是一种集成努力,为蛋白质序列的高吞吐量域分析构建一套生物导向和支持电网的工作流程。使用Taverna处理工作流编号和制定。 [oinn,2004]使用泛型服务工具包(GST)实现了用于包装和调用计算应用程序的支持网格支持服务[Kandaswamy,2006]。该环境的最终结果是数据产品,组织为矩阵,以及适合快速分析的可视化文件。提出了代表生物学示例的数据产品的详细描述。最后,显示一些初步性能数据,包括使用工作流程来确定完整基因组(蜜蜂)中所有蛋白质的域架构。讨论了在基因组中综合分析SNP的综合分析。 Motifnetwork工作流程是或将在网站http://www.tgbioportal.org/中通过Renci Scient Gateway在线提供。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号