首页> 外文期刊>Analytical chemistry >Molecular Descriptor Subset Selection in Theoretical Peptide Quantitative Structure-Retention Relationship Model Development Using Nature-Inspired Optimization Algorithms
【24h】

Molecular Descriptor Subset Selection in Theoretical Peptide Quantitative Structure-Retention Relationship Model Development Using Nature-Inspired Optimization Algorithms

机译:使用自然启发式优化算法开发理论肽定量结构-保留关系模型的分子描述符子集选择

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

In this work, performance of five nature-inspired optimization algorithms, genetic algorithm (GA), particle swarm optimization (PSO), artificial bee colony (ABC), firefly algorithm (FA), and flower pollination algorithm (EPA), was compared in molecular descriptor selection for development of quantitative structure retention relationship (QSRR) models for 83 peptides that originate from eight model proteins. The matrix with 423 descriptors was used as input, and QSRR models based on selected descriptors were built using partial least squares (PLS), whereas root mean square error of prediction (RMSEP) was used as a fitness function for their selection. Three performance criteria, prediction accuracy, computational cost, and the number of selected descriptors, were used to evaluate the developed QSRR models. The results show that all five variable selection methods outperform interval PLS (iPLS), sparse PLS (sPLS), and the full PLS model, whereas GA is superior because of its lowest computational cost and higher accuracy (RMSEP of 5.534%) with a smaller number of variables (nine descriptors). The GA-QSRR model was validated initially through Y-randomization. In addition, it was successfully validated with an external testing set out of 102 peptides originating from Bacillus subtilis proteomes (RMSEP of 22.030%). Its applicability domain was defined, from which it was evident that the developed GA-QSRR exhibited strong robustness. All the sources of the model's error were identified, thus allowing for further application of the developed methodology in proteomics.
机译:在这项工作中,比较了五个自然启发式优化算法,遗传算法(GA),粒子群优化(PSO),人工蜂群(ABC),萤火虫算法(FA)和花粉传粉算法(EPA)的性能。分子描述符选择,用于开发定量结构保留关系(QSRR)模型,该模型用于源自八个模型蛋白的83个肽。使用具有423个描述符的矩阵作为输入,并使用偏最小二乘(PLS)构建基于选定描述符的QSRR模型,而将预测的均方根误差(RMSEP)用作适合度函数进行选择。使用三个性能标准(预测准确性,计算成本和所选描述符的数量)来评估已开发的QSRR模型。结果表明,所有五种变量选择方法均优于区间PLS(iPLS),稀疏PLS(sPLS)和完整PLS模型,而GA则因为其最低的计算成本和较高的精度(RMSEP为5.534%)而更小,因此具有优越的性能变量数(九个描述符)。 GA-QSRR模型最初是通过Y随机化验证的。此外,它已通过外部测试成功验证,该测试对源自枯草芽孢杆菌蛋白质组的102种肽(RMSEP为22.030%)进行了测试。定义了其适用范围,从中可以明显看出,已开发的GA-QSRR具有很强的鲁棒性。确定了模型错误的所有来源,从而允许将开发的方法学进一步应用于蛋白质组学。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号