...
首页> 外文期刊>BMC Bioinformatics >Grading amino acid properties increased accuracies of single point mutation on protein stability prediction
【24h】

Grading amino acid properties increased accuracies of single point mutation on protein stability prediction

机译:分级氨基酸特性提高了单点突变对蛋白质稳定性预测的准确性

获取原文
           

摘要

Background Protein stabilities can be affected sometimes by point mutations introduced to the protein. Current sequence-information-based protein stability prediction encoding schemes of machine learning approaches include sparse encoding and amino acid property encoding. Property encoding schemes employ physical-chemical information of the mutated protein environments, however, they produce complexity in the mean time when many properties joined in the scheme. The complexity introduces noises that affect machine learning algorithm accuracies. In order to overcome the problem we described a new encoding scheme that graded twenty amino acids into groups according to their specific property values. Results We employed three predefined values, 0.1, 0.5, and 0.9 to represent 'weak', 'middle', and 'strong' groups for each amino acid property, and introduced two thresholds for each property to split twenty amino acids into one of the three groups according to their property values. Each amino acid can take only one out of three predefined values rather than twenty different values for each property. The complexity and noises in the encoding schemes were reduced in this way. More than 7% average accuracy improvement was found in the graded amino acid property encoding schemes by 20-fold cross validation. The overall accuracy of our method is more than 72% when performed on the independent test sets starting from sequence information with three-state prediction definitions. Conclusions Grading numeric values of amino acid property can reduce the noises and complexity of input information. It is in accordance with biochemical concepts for amino acid properties and makes the input data simplified in the mean time. The idea of graded property encoding schemes may be applied to protein related predictions with machine learning approaches.
机译:背景技术有时会因引入蛋白质的点突变而影响蛋白质的稳定性。当前基于机器学习方法的基于序列信息的蛋白质稳定性预测编码方案包括稀疏编码和氨基酸特性编码。属性编码方案利用了突变蛋白质环境的物理化学信息,但是,当许多属性加入该方案时,它们会产生复杂性。复杂性会引入影响机器学习算法准确性的噪声。为了克服该问题,我们描述了一种新的编码方案,该方案根据其特定特性值将二十个氨基酸分为几类。结果我们采用三个预定义的值0.1、0.5和0.9来表示每种氨基酸特性的“弱”,“中间”和“强”基团,并为每种特性引入了两个阈值,以将20个氨基酸拆分为一个氨基酸。根据其属性值分为三个组。每个氨基酸只能取三个预定义值中的一个,而不是每个属性取二十个不同的值。以这种方式降低了编码方案中的复杂度和噪声。通过20倍交叉验证,在分级氨基酸特性编码方案中发现平均准确性提高了7%以上。当从具有三态预测定义的序列信息开始的独立测试集上执行时,我们方法的总体准确性超过72%。结论对氨基酸性质的数值进行分级可以减少输入信息的噪音和复杂性。它符合氨基酸特性的生化概念,并在此期间简化了输入数据。分级属性编码方案的思想可以通过机器学习方法应用于与蛋白质相关的预测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号