首页> 外文学位 >Prediction-based genome annotation, domain assignment methods, and their applications in structural genomics.
【24h】

Prediction-based genome annotation, domain assignment methods, and their applications in structural genomics.

机译:基于预测的基因组注释,域分配方法及其在结构基因组学中的应用。

获取原文
获取原文并翻译 | 示例

摘要

The explosion of sequence information in post-genomic era is increasingly widening the gap between the number of protein sequences deposited in public databases and the experimental characterization of these proteins. Computational biology plays a central role in bridging this gap.; In this thesis, I analyzed more than sixty completely-sequenced proteomes using various computational methods. The structural and functional annotations for each protein in the proteomes have been made publicly available through the database PEP. Systematic comparison of different proteomes resulted in several interesting findings regarding evolution. For example, bacteria seemed to have smaller fractions of proteins responsible for communication than multi-cellular organisms. The sequence analysis on genomic scale also led to the discovery of a class of proteins that have long regions of NO-Regular Secondary Structure (NORS) regions and appear to play significant functional roles. NORS proteins are much more abundant in eukaryotes, evolutionarily conserved, important in protein-protein interaction, and over-represented in proteins with regulatory and transcription-related functions.; I have also contributed to the target selection for Northeast Structural Genomics Consortium (NESG) and established an automatic target selection procedure for the consortium. My study revealed that structural genomics might have to target about 48% of all proteins and 52% of residues in the currently known proteomes. I estimated that it might be necessary to experimentally determine over 40,000 structures to minimally cover five eukaryotic proteomes. I also demonstrated that sequence clustering must begin with protein domains and developed two sequence-based domain assignment methods. CHOP, a homology-based method, was able to dissect 70% of proteins into domains-like fragments. Two results stood out from this comprehensive and still preliminary analysis of structural domains in entire proteomes: (1) over 70% of all dissected proteins contained more than one fragment, and (2) the number of CHOP fragments in the protein correlated linearly with length of the protein. Since not all proteins could be dissected by CHOP into structural domains, I developed a new method that predicts domains from sequence based on neural network, ChopNet. It correctly predicts the number of domains for 55% of all proteins and domain boundary positions for 49% of two-domain proteins.
机译:在后基因组时代,序列信息的爆炸式增长日益扩大了公共数据库中存放的蛋白质序列数量与这些蛋白质的实验表征之间的差距。计算生物学在弥合这一差距方面发挥着核心作用。在本文中,我使用各种计算方法分析了60多个完全测序的蛋白质组。蛋白质组中每种蛋白质的结构和功能注释已通过数据库PEP公开提供。不同蛋白质组的系统比较导致了一些有关进化的有趣发现。例如,细菌似乎比多细胞生物具有更少的负责交流的蛋白质。基因组规模的序列分析还导致发现了一类蛋白质,该蛋白质具有长的非常规二级结构(NORS)区区域,并且似乎起着重要的功能作用。 NORS蛋白在真核生物中含量更高,在进化上是保守的,在蛋白-蛋白相互作用中很重要,在具有调节和转录相关功能的蛋白中含量过高。我还为东北结构基因组学联盟(NESG)的目标选择做出了贡献,并为该联盟建立了自动目标选择程序。我的研究表明,结构基因组学可能必须针对目前已知蛋白质组中所有蛋白质的约48%和残基的52%。我估计可能有必要通过实验确定40,000多个结构,以最少覆盖五个真核蛋白质组。我还证明了序列聚类必须从蛋白质域开始,并开发了两种基于序列的域分配方法。 CHOP是一种基于同源性的方法,能够将70%的蛋白质分解为类似域的片段。从对整个蛋白质组结构域的全面而初步的分析中得出了两个结果:(1)70%以上的所有被切蛋白中都包含一个以上的片段,(2)该蛋白中CHOP片段的数量与长度呈线性相关蛋白质。由于并非所有蛋白都能被CHOP分解为结构域,因此我开发了一种新方法,该方法可基于神经网络ChopNet从序列中预测结构域。它可以正确预测55%的所有蛋白质的结构域数目和49%的两个结构域的蛋白质的结构域边界位置。

著录项

  • 作者

    Liu, Jinfeng.;

  • 作者单位

    Columbia University.;

  • 授予单位 Columbia University.;
  • 学科 Health Sciences Pharmacology.; Biology Biostatistics.
  • 学位 Ph.D.
  • 年度 2004
  • 页码 177 p.
  • 总页数 177
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 药理学;生物数学方法;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号