...
首页> 外文期刊>BMC Bioinformatics >On the amyloid datasets used for training PAFIG how (not) to extend the experimental dataset of hexapeptides
【24h】

On the amyloid datasets used for training PAFIG how (not) to extend the experimental dataset of hexapeptides

机译:在用于训练PAFIG的淀粉样蛋白数据集上,如何(不)延长Hepapeptides的实验数据集

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Background Amyloids are proteins capable of forming aberrant intramolecular contact sites, characteristic of beta zipper configuration. Amyloids can underlie serious health conditions, e.g. Alzheimer’s or Parkinson’s diseases. It has been proposed that short segments of amino acids can be responsible for protein amyloidogenicity, but no more than two hundred such hexapeptides have been experimentally found. The authors of the computational tool Pafig published in BMC Bioinformatics a method for extending the amyloid hexapeptide dataset that could be used for training and testing models. They assumed that all hexapeptides belonging to an amyloid protein can be regarded as amylopositive, while those from proteins never reported as amyloid are always amylonegative. Here we show why the above described method of extending datasets is wrong and discuss the reasons why the incorrect data could lead to falsely correct classification. Results The amyloid classification of hexapeptides by Pafig was confronted with the classification results from different state of the art computational methods and the outputs of all methods were studied by clustering analysis. The clustering methods show that Pafig is an outlier with regard to other approaches. Our study of the statistical patterns of its training and testing datasets showed a strong bias towards STVIIE hexapeptide in their positive part. Different statistical patterns of seemingly amylo -positive and -negative hexapeptides allow for a repeatable classification, which is not related to amyloid propensity of the hexapetides. Conclusions Our study on recognition of amyloid hexapeptides showed that occurrence of incidental patterns in wrongly selected datasets can produce falsely correct results of classification. The assumption that all hexapeptides belonging to amyloid protein can be regarded as amylopositive and those from proteins never reported as amyloid are always amylonegative is not supported by any other computational method. This is in line with experimental observations that amyloid propensity of a full protein can result from only one amyloidogenic fragment in this protein, while the occurrence of amyliodogenic part that is well hidden inside the protein may never lead to fibril formation. This leads to the conclusion that Pafig does not provide correct classification with regard to amyloidogenicity.
机译:背景技术淀粉样蛋白是能够形成异常分子内接触位点的蛋白质,β拉链构造的特征。淀粉样蛋白可以提高严重的健康状况,例如,阿尔茨海默氏症或帕金森的疾病。已经提出,氨基酸的短段可以负责蛋白质淀粉样蛋白产生,但是已经通过实验发现了超过两百种此类六肽。在BMC生物信息学中公布的计算工具PAFIG的作者一种用于延伸可用于训练和测试模型的淀粉样蛋白Hexapeptide数据集的方法。他们认为,属于淀粉样蛋白的所有六肽可以被认为是淀粉状体阳性,而来自从未报道的蛋白质的蛋白质的肽总是淀粉蛋白总是淀粉原始的。在这里,我们展示了为什么上述扩展数据集的方法错误,并讨论了错误数据可能导致错误正确分类的原因。结果Pafig的淀粉样蛋白淀粉样蛋白分类面对不同状态的分类结果,并通过聚类分析研究了所有方法的产出。聚类方法表明,Pafig是关于其他方法的异常值。我们对其训练和测试数据集的统计模式的研究表明,在它们的正部分中对STVIIE Hexapeptide的强烈偏向。看似淀粉状体的不同统计模式 - 阳性和阴性六肽允许可重复的分类,其与六氧化乙烷的淀粉样蛋白倾向无关。结论我们对淀粉样蛋白六肽的识别的研究表明,错误选择的数据集中的偶然图案发生可能产生错误的分类结果。假设属于淀粉样蛋白蛋白的所有六肽可以被认为是淀粉状体阳性,并且来自从未报道的蛋白质的蛋白质的蛋白质是任何其他计算方法不支持淀粉的蛋白质。这与实验观察结果一致,即全蛋白质的淀粉样蛋白倾向可以由该蛋白质中的一个淀粉样蛋白片段产生,而在蛋白质内部隐藏的淀粉皂层部分的发生可能永远不会导致原纤维形成。这导致了Pafig不提供对淀粉样蛋白产生的正确分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号