Indexing methods for protein tertiary and predicted structures.

机译：蛋白质三级和预测结构的索引方法。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This thesis focuses on the problem of fast sub-structure search and remote homology detection in proteins by finding similar (sub) structures. That is, for a given query protein and a large database of protein structures, we want to retrieve all the similar structures from the database rapidly. With the growing number of proteins deposited in the database, searching the database is a difficult and time-consuming task. In fact, high throughput proteomics methods are already accumulating the protein interaction data that we would wish to model, but fast computational methods for database searching lag far behind; biologists are in need of a means to search the protein structure databases rapidly, similar to the way BLAST rapidly searches the sequence databases.; We are interested in two main problems that arise in sub-structure and remote homology searches, namely protein tertiary structure indexing and predicted structure indexing for those proteins whose structures have not been determined experimentally. In our tertiary structure indexing approach, a new method for extracting the local feature vectors of protein structures is presented. Each residue is represented by a triangle, and the correlation between a set of residues is described by the distances between C alpha atoms and the angles between the normals of planes in which the triangles lie. The normalized local feature vectors are indexed using a suffix tree. For all query segments, suffix trees can be used effectively to retrieve the maximal matches, which are then chained to obtain alignments with database proteins. Similar proteins are selected by their alignment score against the query. In our predicted structure indexing approach, a hidden Markov model (HMMSTR) of high sequence-structure local motifs (I-sites library) is used to generate the feature vectors for the structure predicted for a given sequence. Remote homologous proteins are detected by using the suffix tree index over the predicted structures. We test our algorithms on several real datasets. We improve both the time and accuracy performance of the tertiary structure indexing and classification. We also find more remote homologous proteins from the database of predicted structures than competing methods.

机译：本文通过寻找相似的（亚）结构，着重研究蛋白质中快速亚结构搜索和远程同源性检测的问题。也就是说，对于给定的查询蛋白质和庞大的蛋白质结构数据库，我们希望从数据库中快速检索所有相似的结构。随着存储在数据库中的蛋白质数量的增加，搜索数据库是一项困难且耗时的任务。实际上，高通量蛋白质组学方法已经在积累我们希望建模的蛋白质相互作用数据，但是用于数据库搜索的快速计算方法却远远落后。生物学家需要一种快速搜索蛋白质结构数据库的方法，类似于BLAST快速搜索序列数据库的方法。我们对在亚结构和远程同源性搜索中出现的两个主要问题感兴趣，即那些尚未通过实验确定其结构的蛋白质的蛋白质三级结构索引和预测的结构索引。在我们的三级结构索引方法中，提出了一种提取蛋白质结构局部特征向量的新方法。每个残基由一个三角形表示，一组残基之间的相关性由C alpha原子之间的距离和三角形所在平面的法线之间的角度来描述。使用后缀树对归一化的局部特征向量进行索引。对于所有查询段，可以有效地使用后缀树来检索最大匹配，然后将其链接起来以获得与数据库蛋白质的比对。通过针对查询的比对得分选择相似的蛋白质。在我们的预测结构索引方法中，高序列结构局部基序的隐藏马尔可夫模型（HMMSTR）（I-位点库）用于为给定序列预测的结构生成特征向量。通过在预测结构上使用后缀树索引来检测远程同源蛋白。我们在几个真实的数据集上测试我们的算法。我们提高了三级结构索引和分类的时间和准确性性能。我们还从预测结构的数据库中找到比竞争方法更多的远程同源蛋白。

著录项

作者
Gao, Feng.;
展开▼
作者单位

Rensselaer Polytechnic Institute.;

展开▼
授予单位 Rensselaer Polytechnic Institute.;
学科 Biology Bioinformatics.; Computer Science.
学位 Ph.D.
年度 2006
页码 73 p.
总页数 73
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. An assessment of the accuracy of methods for predicting hydrogen positions in protein structures. [J] . Forrest LR, Honig B Proteins: Structure, Function, and Genetics . 2005,第2期

机译：预测蛋白质结构中氢位置的方法准确性的评估。
2. Predicting ion binding properties for RNA tertiary structures. [J] . Tan ZJ, Chen SJ Biophysical Journal . 2010,第5期

机译：预测RNA三级结构的离子结合特性。
3. Ab initio method for predicting tertiary structures of globular proteins [J] . Kobayashi Y., Saito N., Sasabe H. Fluid Phase Equilibria . 1998,第1a2期

机译：从头算方法预测球状蛋白的三级结构
4. Bhageerath: A web-enabled high performance computing software suite for predicting the tertiary structures of small globular proteins using all atom energy based ab initio methods [C] . Shashank Shekhar, Priyanka Dhingra, Bharat Lakhani, International Conference on Bioinformatics Computational Biology . 2010

机译：Bhageerath：一种支持网络的高性能计算软件套件，用于使用所有原子能量的AB Initio方法预测小球状蛋白的三级结构
5. Computer modeling of protein tertiary structure and DNA binding energetics. I. Empirical free energy analysis of the engrailed Q50K variant-DNA complex and its mutants. II. The predicted structure of the adenovirus E4 orf6 protein by threading and comparative protein modeling. [D] . Brown, Lawrence Milton, III. 2001

机译：蛋白质三级结构和DNA结合能学的计算机建模。 I.陷入困境的Q50K变异体-DNA复合体及其突变体的经验自由能分析。二。通过穿线和比较蛋白建模预测腺病毒E4 orf6蛋白的结构。
6. Distance geometry generates native-like folds for small helical proteins using the consensus distances of predicted protein structures. [O] . E. S. Huang, R. Samudrala, J. W. Ponder 1998

机译：距离几何使用预测的蛋白质结构的共有距离为小螺旋蛋白质生成类似天然的折叠。
7. An ensemble method for predicting subnuclear localizations from primary protein structures. [O] . Guo Sheng Han, Zu Guo Yu, Vo Anh, 2013

机译：用于预测来自初级蛋白质结构的亚核定位的集合方法。

Indexing methods for protein tertiary and predicted structures.

摘要

著录项

相似文献

相关主题

期刊订阅