首页> 外文学位 >Advanced protein sequence analysis methods for structure and function prediction.
【24h】

Advanced protein sequence analysis methods for structure and function prediction.

机译:用于结构和功能预测的高级蛋白质序列分析方法。

获取原文
获取原文并翻译 | 示例

摘要

As a result of rapid accumulation of genomic data from high-throughput genomic projects, scientists face an enormous task of characterizing each protein encoded by these genomes in order to understand how these proteins function in making up a whole living cell. Expensive and time consuming, in some cases even inapplicable, experimental approaches for verifying information on protein structure and function have motivated the development of computational methods for reliable and large-scale characterization of proteins.; In this dissertation, my research contribution in two major areas of protein sequence analysis is presented. The first contribution is in the field of protein sequence comparison in which I have developed a heuristic approach for comparison of profile hidden Markov models based on their quasi-consensus sequences. This method, referred to as QC-COMP, is shown to be significantly faster and more accurate than COMPASS---the existing state of the art method. On a related project, I have built a web based benchmark facility server for the Critical Assessment of Sequence alignment Accuracy.; As a second contribution, I have developed an improved hidden Markov model for topology prediction and identification of integral membrane proteins. The resulted program (TMMOD) is a systematic modification of an existing model (TMHMM) addressing key performance issues. In accuracy performance benchmark experiments, TMMOD is shown to have significantly improved results. In cross-validation experiments using a set of 83 transmembrane proteins with known topology, TMMOD outperformed TMHMM and other existing methods, with an accuracy of 89% for both topology and locations. In another experiment using a separate set of 160 transmembrane proteins, TMMOD has an accuracy of 84% for topology and 89% for locations.; Most computational methods for transmembrane protein topology prediction rely on compositional bias of amino acids to locate those hydrophobic domains in transmembrane proteins. Since signal peptides also contain hydrophobic segments, these computational prediction methods mistakenly identify signal peptides as transmembrane proteins. The SVM-Fisher discrimination approach was applied to further improve the ability of TMMOD to identify signal peptides as negatives. Using the SVM-Fisher discrimination method, mis-prediction of signal peptides as membrane proteins was reduced by more than a third.
机译:由于来自高通量基因组计划的基因组数据的快速积累,科学家面临着表征这些基因组编码的每种蛋白质的艰巨任务,以便了解这些蛋白质如何组成整个活细胞。昂贵且费时的,甚至在不适用的情况下,用于验证蛋白质结构和功能信息的实验方法也激发了对蛋白质进行可靠和大规模表征的计算方法的发展。本文介绍了我在蛋白质序列分析的两个主要领域的研究贡献。第一个贡献是在蛋白质序列比较领域,在该领域中,我开发了一种启发式方法,用于基于准共识序列比较配置文件隐藏的Markov模型。这种方法被称为QC-COMP,它比COMPASS ---现有的最新技术方法显着更快,更准确。在一个相关项目中,我构建了一个基于Web的基准设施服务器,用于对序列比对准确性进行关键评估。作为第二个贡献,我开发了一种改进的隐马尔可夫模型,用于拓扑预测和鉴定完整的膜蛋白。生成的程序(TMMOD)是对解决关键性能问题的现有模型(TMHMM)的系统修改。在准确性性能基准实验中,TMMOD被证明具有显着改善的结果。在使用一组83种具有已知拓扑结构的跨膜蛋白的交叉验证实验中,TMMOD优于TMHMM和其他现有方法,其拓扑结构和位置的准确性均达到89%。在另一套使用单独的160个跨膜蛋白的实验中,TMMOD的拓扑精度为84%,位置的精度为89%。跨膜蛋白拓扑预测的大多数计算方法都依赖于氨基酸的组成偏向来定位跨膜蛋白中的那些疏水域。由于信号肽还包含疏水性片段,因此这些计算预测方法错误地将信号肽识别为跨膜蛋白。 SVM-Fisher判别方法用于进一步提高TMMOD将信号肽鉴定为阴性的能力。使用SVM-Fisher判别方法,信号肽作为膜蛋白的错误预测减少了三分之一以上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号