...
首页> 外文期刊>Protein Science: A Publication of the Protein Society >A Consensus Data Mining secondary structure prediction by combining GOR V and Fragment Database Mining.
【24h】

A Consensus Data Mining secondary structure prediction by combining GOR V and Fragment Database Mining.

机译:通过结合GOR V和片段数据库挖掘进行共识数据挖掘二级结构预测。

获取原文
获取原文并翻译 | 示例

摘要

The major aim of tertiary structure prediction is to obtain protein models with the highest possible accuracy. Fold recognition, homology modeling, and de novo prediction methods typically use predicted secondary structures as input, and all of these methods may significantly benefit from more accurate secondary structure predictions. Although there are many different secondary structure prediction methods available in the literature, their cross-validated prediction accuracy is generally <80%. In order to increase the prediction accuracy, we developed a novel hybrid algorithm called Consensus Data Mining (CDM) that combines our two previous successful methods: (1) Fragment Database Mining (FDM), which exploits the Protein Data Bank structures, and (2) GOR V, which is based on information theory, Bayesian statistics, and multiple sequence alignments (MSA). In CDM, the target sequence is dissected into smaller fragments that are compared with fragments obtained from related sequences in the PDB. For fragments with a sequence identity above a certain sequence identity threshold, the FDM method is applied for the prediction. The remainder of the fragments are predicted by GOR V. The results of the CDM are provided as a function of the upper sequence identities of aligned fragments and the sequence identity threshold. We observe that the value 50% is the optimum sequence identity threshold, and that the accuracy of the CDM method measured by Q(3) ranges from 67.5% to 93.2%, depending on the availability of known structural fragments with sufficiently high sequence identity. As the Protein Data Bank grows, it is anticipated that this consensus method will improve because it will rely more upon the structural fragments.
机译:三级结构预测的主要目的是获得具有尽可能高准确性的蛋白质模型。折叠识别,同源性建模和从头预测方法通常使用预测的二级结构作为输入,所有这些方法都可能会从更准确的二级结构预测中受益。尽管文献中提供了许多不同的二级结构预测方法,但它们的交叉验证预测精度通常<80%。为了提高预测准确性,我们开发了一种新颖的混合算法,称为共识数据挖掘(CDM),该算法结合了我们之前的两个成功方法:(1)利用蛋白质数据库结构的片段数据库挖掘(FDM),以及(2) )GOR V,它基于信息论,贝叶斯统计和多序列比对(MSA)。在CDM中,将靶序列切成较小的片段,然后将其与从PDB中相关序列获得的片段进行比较。对于具有高于某个序列同一性阈值的序列同一性的片段,将FDM方法应用于预测。片段的其余部分由GOR V预测。CDM的结果根据比对片段的较高序列同一性和序列同一性阈值提供。我们观察到值50%是最佳序列同一性阈值,并且Q(3)测得的CDM方法的准确性范围为67.5%至93.2%,这取决于具有足够高序列同一性的已知结构片段的可用性。随着蛋白质数据库的发展,这种共识方法将得到改善,因为它将更多地依赖于结构片段。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号