首页> 外文期刊>BMC Bioinformatics >Application of fourier transform and proteochemometrics principles to protein engineering
【24h】

Application of fourier transform and proteochemometrics principles to protein engineering

机译:傅里叶变换和蛋白质化学计量学原理在蛋白质工程中的应用

获取原文
           

摘要

Connecting the dots between the protein sequence and its function is of fundamental interest for protein engineers. In-silico methods are useful in this quest especially when structural information is not available. In this study we propose a mutant library screening tool called iSAR (innovative Sequence Activity Relationship) that relies on the physicochemical properties of the amino acids, digital signal processing and partial least squares regression to uncover these sequence-function correlations. We show that the digitalized representation of the protein sequence in the form of a Fourier spectrum can be used as an efficient descriptor to model the sequence-activity relationship of proteins. The iSAR methodology that we have developed identifies high fitness mutants from mutant libraries relying on physicochemical properties of the amino acids, digital signal processing and regression techniques. iSAR correlates variations caused by mutations in spectra with biological activity/fitness. It takes into account the impact of mutations on the whole spectrum and does not focus on local fitness alone. The utility of the method is illustrated on 4 datasets: cytochrome P450 for thermostability, TNF-alpha for binding affinity, GLP-2 for potency and enterotoxins for thermostability. The choice of the datasets has been made such as to illustrate the ability of the method to perform when limited training data is available and also when novel mutations appear in the test set, that have not been featured in the training set. The combination of Fast Fourier Transform and Partial Least Squares regression is efficient in capturing the effects of mutations on the function of the protein. iSAR is a fast algorithm which can be implemented with limited computational resources and can make effective predictions even if the training set is limited in size.
机译:将蛋白质序列及其功能之间的点连接起来对于蛋白质工程师来说是至关重要的。计算机内方法在此任务中很有用,尤其是在没有结构信息的情况下。在这项研究中,我们提出了一种称为iSAR(创新序列活性关系)的突变库筛选工具,该工具依赖于氨基酸的理化特性,数字信号处理和偏最小二乘回归来揭示这些序列功能的相关性。我们表明,蛋白质序列的傅立叶光谱形式的数字化表示可以用作模型的蛋白质的序列活性关系的有效描述符。我们开发的iSAR方法依靠氨基酸的物理化学特性,数字信号处理和回归技术从突变体文库中鉴定出高适应性突变体。 iSAR将光谱突变引起的变异与生物活性/适合度相关联。它考虑了突变对整个频谱的影响,并且不仅仅关注局部适应性。在4个数据集上说明了该方法的实用性:用于热稳定性的细胞色素P450,用于结合亲和力的TNF-alpha,用于效能的GLP-2和用于热稳定性的肠毒素。进行数据集的选择是为了说明该方法在有限的训练数据可用时以及在测试集中未出现的新突变出现在测试集中时执行该方法的能力。快速傅里叶变换和偏最小二乘回归的结合可以有效地捕获突变对蛋白质功能的影响。 iSAR是一种快速算法,可以用有限的计算资源来实现,并且即使训练集的大小有限也可以做出有效的预测。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号