首页> 外文学位 >Indexing methods for protein tertiary and predicted structures.
【24h】

Indexing methods for protein tertiary and predicted structures.

机译:蛋白质三级和预测结构的索引方法。

获取原文
获取原文并翻译 | 示例

摘要

This thesis focuses on the problem of fast sub-structure search and remote homology detection in proteins by finding similar (sub) structures. That is, for a given query protein and a large database of protein structures, we want to retrieve all the similar structures from the database rapidly. With the growing number of proteins deposited in the database, searching the database is a difficult and time-consuming task. In fact, high throughput proteomics methods are already accumulating the protein interaction data that we would wish to model, but fast computational methods for database searching lag far behind; biologists are in need of a means to search the protein structure databases rapidly, similar to the way BLAST rapidly searches the sequence databases.; We are interested in two main problems that arise in sub-structure and remote homology searches, namely protein tertiary structure indexing and predicted structure indexing for those proteins whose structures have not been determined experimentally. In our tertiary structure indexing approach, a new method for extracting the local feature vectors of protein structures is presented. Each residue is represented by a triangle, and the correlation between a set of residues is described by the distances between C alpha atoms and the angles between the normals of planes in which the triangles lie. The normalized local feature vectors are indexed using a suffix tree. For all query segments, suffix trees can be used effectively to retrieve the maximal matches, which are then chained to obtain alignments with database proteins. Similar proteins are selected by their alignment score against the query. In our predicted structure indexing approach, a hidden Markov model (HMMSTR) of high sequence-structure local motifs (I-sites library) is used to generate the feature vectors for the structure predicted for a given sequence. Remote homologous proteins are detected by using the suffix tree index over the predicted structures. We test our algorithms on several real datasets. We improve both the time and accuracy performance of the tertiary structure indexing and classification. We also find more remote homologous proteins from the database of predicted structures than competing methods.
机译:本文通过寻找相似的(亚)结构,着重研究蛋白质中快速亚结构搜索和远程同源性检测的问题。也就是说,对于给定的查询蛋白质和庞大的蛋白质结构数据库,我们希望从数据库中快速检索所有相似的结构。随着存储在数据库中的蛋白质数量的增加,搜索数据库是一项困难且耗时的任务。实际上,高通量蛋白质组学方法已经在积累我们希望建模的蛋白质相互作用数据,但是用于数据库搜索的快速计算方法却远远落后。生物学家需要一种快速搜索蛋白质结构数据库的方法,类似于BLAST快速搜索序列数据库的方法。我们对在亚结构和远程同源性搜索中出现的两个主要问题感兴趣,即那些尚未通过实验确定其结构的蛋白质的蛋白质三级结构索引和预测的结构索引。在我们的三级结构索引方法中,提出了一种提取蛋白质结构局部特征向量的新方法。每个残基由一个三角形表示,一组残基之间的相关性由C alpha原子之间的距离和三角形所在平面的法线之间的角度来描述。使用后缀树对归一化的局部特征向量进行索引。对于所有查询段,可以有效地使用后缀树来检索最大匹配,然后将其链接起来以获得与数据库蛋白质的比对。通过针对查询的比对得分选择相似的蛋白质。在我们的预测结构索引方法中,高序列结构局部基序的隐藏马尔可夫模型(HMMSTR)(I-位点库)用于为给定序列预测的结构生成特征向量。通过在预测结构上使用后缀树索引来检测远程同源蛋白。我们在几个真实的数据集上测试我们的算法。我们提高了三级结构索引和分类的时间和准确性性能。我们还从预测结构的数据库中找到比竞争方法更多的远程同源蛋白。

著录项

  • 作者

    Gao, Feng.;

  • 作者单位

    Rensselaer Polytechnic Institute.;

  • 授予单位 Rensselaer Polytechnic Institute.;
  • 学科 Biology Bioinformatics.; Computer Science.
  • 学位 Ph.D.
  • 年度 2006
  • 页码 73 p.
  • 总页数 73
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号