首页> 外文学位 >Navigating the extremes of biological datasets for reliable structural inference and design.
【24h】

Navigating the extremes of biological datasets for reliable structural inference and design.

机译:浏览生物学数据集的极限以进行可靠的结构推断和设计。

获取原文
获取原文并翻译 | 示例

摘要

Structural biologists currently confront serious challenges in the effective interpretation of experimental data due to two contradictory situations: a severe lack of structural data for certain classes of proteins, and an incredible abundance of data for other classes. The challenge with small data sets is how to extract sufficient information to draw meaningful conclusions, while the challenge with large data sets is how to curate, categorize, and search the data to allow for its meaningful interpretation and application to scientific problems. Here, we develop computational strategies to address both sparse and abundant data sets. In the category of sparse data sets, we focus our attention on the problem of transmembrane (TM) protein structure determination. As X-ray crystallography and NMR data is notoriously difficult to obtain for TM proteins, we develop a novel algorithm which uses low-resolution data from protein cross-linking or scanning mutagenesis studies to produce models of TM helix oligomers and show that our method produces models with an accuracy on par with X-ray crystallography or NMR for a test set of known TM proteins. Turning to instances of data abundance, we examine how to mine the vast stores of protein structural data in the Protein Data Bank (PDB) to aid in the design of proteins with novel binding properties. We show how the identification of an anion binding motif in an antibody structure allowed us to develop a phosphate binding module that can be used to produce novel antibodies to phosphorylated peptides -- creating antibodies to 7 novel phospho-peptides to illustrate the utility of our approach. We then describe a general strategy for designing binders to a target protein epitope based upon recapitulating protein interaction geometries which are over-represented in the PDB. We follow this by using data describing the transition probabilities of amino acids to develop a novel set of degenerate codons to create more efficient gene libraries. We conclude by describing a novel, real-time, all-atom structural search engine, giving researchers the ability to quickly search known protein structures for a motif of interest and providing a new interactive paradigm of protein design.
机译:由于两个矛盾的情况,结构生物学家当前在有效解释实验数据方面面临着严峻挑战:某些蛋白质类型的结构数据严重缺乏,而其他蛋白质类型的数据却令人难以置信。使用小数据集的挑战是如何提取足够的信息以得出有意义的结论,而使用大数据集的挑战是如何整理,分类和搜索数据,以便对其进行有意义的解释并应用于科学问题。在这里,我们开发了计算策略来解决稀疏和丰富的数据集。在稀疏数据集的类别中,我们将注意力集中在跨膜(TM)蛋白质结构确定的问题上。由于众所周知很难获得TM蛋白质的X射线晶体学和NMR数据,因此我们开发了一种新颖的算法,该算法使用了来自蛋白质交联或扫描诱变研究的低分辨率数据来生成TM螺旋低聚物的模型,并证明了我们的方法能够产生可以与X射线晶体学或NMR准确地对已知TM蛋白的测试集建立模型。关于数据丰富的实例,我们研究了如何在蛋白质数据库(PDB)中挖掘大量蛋白质结构数据存储,以帮助设计具有新颖结合特性的蛋白质。我们展示了抗体结构中阴离子结合基序的鉴定如何使我们开发出磷酸结合模块,该模块可用于产生针对磷酸化肽的新型抗体-创建针对7种新型磷酸肽的抗体以说明我们方法的实用性。然后,我们描述了基于概括性蛋白质相互作用几何结构(在PDB中过分代表)来设计目标蛋白质表位结合物的一般策略。我们通过使用描述氨基酸的转移概率的数据来发展新的简并密码子集以创建更有效的基因库来遵循此方法。我们以描述新颖,实时,全原子的结构搜索引擎作为结束语,它使研究人员能够快速搜索已知蛋白质结构中的目标基序,并提供蛋白质设计的新交互范例。

著录项

  • 作者

    Hannigan, Brett T.;

  • 作者单位

    University of Pennsylvania.;

  • 授予单位 University of Pennsylvania.;
  • 学科 Biology Bioinformatics.;Biophysics Biomechanics.;Biology Molecular.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 217 p.
  • 总页数 217
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号