首页> 美国卫生研究院文献>BMC Bioinformatics >Protein secondary structure prediction using a small training set (compact model) combined with a Complex-valued neural network approach
【2h】

Protein secondary structure prediction using a small training set (compact model) combined with a Complex-valued neural network approach

机译:使用小型训练集(紧凑模型)结合复杂值神经网络方法进行蛋白质二级结构预测

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

BackgroundProtein secondary structure prediction (SSP) has been an area of intense research interest. Despite advances in recent methods conducted on large datasets, the estimated upper limit accuracy is yet to be reached. Since the predictions of SSP methods are applied as input to higher-level structure prediction pipelines, even small errors may have large perturbations in final models. Previous works relied on cross validation as an estimate of classifier accuracy. However, training on large numbers of protein chains compromises the classifier ability to generalize to new sequences. This prompts a novel approach to training and an investigation into the possible structural factors that lead to poor predictions.Here, a small group of 55 proteins termed the compact model is selected from the CB513 dataset using a heuristics-based approach. In a prior work, all sequences were represented as probability matrices of residues adopting each of Helix, Sheet and Coil states, based on energy calculations using the >C->Alpha, C->Beta, >Side-chain (CABS) algorithm. The functional relationship between the conformational energies computed with CABS force-field and residue states is approximated using a classifier termed the Fully Complex-valued Relaxation Network (FCRN). The FCRN is trained with the compact model proteins.
机译:背景技术蛋白质二级结构预测(SSP)已成为研究热点。尽管最近在大型数据集上进行的方法有所进步,但估计的上限精度尚未达到。由于SSP方法的预测被用作高级结构预测管线的输入,因此即使很小的误差也可能在最终模型中产生较大的扰动。先前的工作依赖于交叉验证作为分类器准确性的估计。但是,对大量蛋白质链进行训练会损害分类器推广到新序列的能力。这提示了一种新颖的训练方法以及对可能导致不良预测的可能结构因素进行研究的方法。在此,使用基于启发式方法从CB513数据集中选择了55个蛋白质的一小组,称为紧凑模型。在先前的工作中,基于使用> C -> A lpha,C进行的能量计算,所有序列均表示为采用螺旋,Sheet和Coil状态的残基概率矩阵。 -> B eta,> S ide-chain(CABS)算法。使用称为完全复数值松弛网络(FCRN)的分类器,可以近似估算CABS力场计算的构象能量与残基状态之间的函数关系。使用紧凑型模型蛋白训练FCRN。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号