...
首页> 外文期刊>Structure and Bonding >Data Mining for Protein Secondary Structure Prediction
【24h】

Data Mining for Protein Secondary Structure Prediction

机译:蛋白质二级结构预测的数据挖掘

获取原文
获取原文并翻译 | 示例

摘要

Accurate protein secondary structure prediction from the amino acid sequence is essential for almost all theoretical and experimental studies on protein structure and function. After a brief discussion of application of data mining for optimization of crystallization conditions for target proteins we show that data mining of structural fragments of proteins from known structures in the protein data bank (PDB) significantly improves the accuracy of secondary structure predictions. The original method was proposed by us a few years ago and was termed fragment database mining (FDM) (Cheng H, Sen TZ, Kloczkowski A, Margaritis D, Jernigan RL (2005) Prediction of protein secondary structure by mining structural fragment database. Polymer 46:4314-4321). This method gives excellent accuracy for predictions if similar sequence fragments are available in our library of structural fragments, but is less successful if such fragments are absent in the fragments database. Recently we have improved secondary structure predictions further by combining FDM with classical GOR V (Kloczkowski A, Ting KL, Jernigan RL, Garnier J (2002a) Combining the GOR V algorithm with evolutionary information for protein secondary structure prediction from amino acid sequence. Proteins 49:154-66; Sen TZ, Jernigan RL, Garnier J, Kloczkowski A (2005) GOR V server for protein secondary structure prediction. Bioinformatics 21:2787-8) predictions to form a combined method, so-called consensus database mining (CDM) (Sen TZ, Cheng H, Kloczkowski A, Jernigan RL (2006) A Consensus Data Mining secondary structure prediction by combining GOR V and Fragment Database Mining. Protein Sci 15:2499-506). FDM mines the structural segments of PDB, and utilizes structural information from the matching sequence fragments for the prediction of protein secondary structures. By combining it with the GOR V secondary structure prediction method, which is based on information theory and Bayesian statistics, coupled with evolutionary information from multiple sequence alignments (MSA), our CDM method guarantees improved accuracies of prediction. Additionally, with the constant growth in the number of new protein structures and folds in the PDB, the accuracy of the CDM method is clearly expected to increase in future. We have developed a publicly available CDM server (Cheng H, Sen TZ, Jernigan RL, Kloczkowski A (2007) Consensus Data Mining (CDM) Protein Secondary Structure Prediction Server: combining GOR V and Fragment Database Mining (FDM). Bioinformatics 23:2628-30) () for protein secondary structure prediction.
机译:从氨基酸序列准确预测蛋白质二级结构对于几乎所有有关蛋白质结构和功能的理论和实验研究都是必不可少的。在简短讨论了将数据挖掘用于优化目标蛋白质结晶条件的应用后,我们表明对蛋白质数据库(PDB)中已知结构的蛋白质结构片段进行数据挖掘可显着提高二级结构预测的准确性。最初的方法是我们几年前提出的,被称为片段数据库挖掘(FDM)(Cheng H,Sen TZ,Kloczkowski A,Margaritis D,Jernigan RL(2005)通过挖掘结构片段数据库预测蛋白质二级结构。 46:4314-4321)。如果在我们的结构片段库中有类似的序列片段可用,则此方法可提供出色的预测准确性,但如果片段数据库中不存在此类片段,则该方法的成功率将较低。最近,我们通过将FDM与经典GOR V相结合进一步改善了二级结构的预测(Kloczkowski A,Ting KL,Jernigan RL,Garnier J(2002a)将GOR V算法与进化信息相结合,可从氨基酸序列预测蛋白质二级结构。蛋白质49 :154-66; Sen TZ,Jernigan RL,Garnier J,Kloczkowski A(2005)GOR V服务器,用于蛋白质二级结构预测。生物信息学21:2787-8)预测形成一种组合方法,即所谓的共识数据库挖掘(CDM) (Sen TZ,Cheng H,Kloczkowski A,Jernigan RL(2006)通过结合GOR V和片段数据库挖掘,共识数据挖掘二级结构预测.Protein Sci 15:2499-506)。 FDM挖掘PDB的结构部分,并利用来自匹配序列片段的结构信息来预测蛋白质的二级结构。通过将其与基于信息论和贝叶斯统计的GOR V二级结构预测方法相结合,再加上来自多个序列比对(MSA)的进化信息,我们的CDM方法可确保提高预测的准确性。此外,随着PDB中新蛋白质结构和折叠数量的不断增长,预计CDM方法的准确性将来会提高。我们已经开发了一个公开可用的CDM服务器(Cheng H,Sen TZ,Jernigan RL,Kloczkowski A(2007)共识数据挖掘(CDM)蛋白二级结构预测服务器:结合了GOR V和片段数据库挖掘(FDM)。生物信息学23:2628 -30)()用于蛋白质二级结构预测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号