首页> 外文期刊>BMC Bioinformatics >A review of machine learning methods to predict the solubility of overexpressed recombinant proteins in Escherichia coli
【24h】

A review of machine learning methods to predict the solubility of overexpressed recombinant proteins in Escherichia coli

机译:机器学习方法的综述,以预测过表达的重组蛋白在大肠杆菌中的溶解度

获取原文
           

摘要

Background Over the last 20?years in biotechnology, the production of recombinant proteins has been a crucial bioprocess in both biopharmaceutical and research arena in terms of human health, scientific impact and economic volume. Although logical strategies of genetic engineering have been established, protein overexpression is still an art. In particular, heterologous expression is often hindered by low level of production and frequent fail due to opaque reasons. The problem is accentuated because there is no generic solution available to enhance heterologous overexpression. For a given protein, the extent of its solubility can indicate the quality of its function. Over 30% of synthesized proteins are not soluble. In certain experimental circumstances, including temperature, expression host, etc., protein solubility is a feature eventually defined by its sequence. Until now, numerous methods based on machine learning are proposed to predict the solubility of protein merely from its amino acid sequence. In spite of the 20?years of research on the matter, no comprehensive review is available on the published methods. Results This paper presents an extensive review of the existing models to predict protein solubility in Escherichia coli recombinant protein overexpression system. The models are investigated and compared regarding the datasets used, features, feature selection methods, machine learning techniques and accuracy of prediction. A discussion on the models is provided at the end. Conclusions This study aims to investigate extensively the machine learning based methods to predict recombinant protein solubility, so as to offer a general as well as a detailed understanding for researches in the field. Some of the models present acceptable prediction performances and convenient user interfaces. These models can be considered as valuable tools to predict recombinant protein overexpression results before performing real laboratory experiments, thus saving labour, time and cost.
机译:背景技术在生物技术的最近20年中,就人体健康,科学影响和经济规模而言,重组蛋白的生产一直是生物制药和研究领域中至关重要的生物过程。尽管已经建立了基因工程的逻辑策略,但是蛋白质过表达仍然是一门艺术。特别地,异源表达通常由于生产水平低而受到阻碍,并且由于不透明的原因而经常失败。由于没有可用的通用解决方案来增强异源过表达,因此问题更加突出。对于给定的蛋白质,其溶解度可以表明其功能的质量。超过30%的合成蛋白质不溶。在某些实验环境中,包括温度,表达宿主等,蛋白质溶解度是最终由其序列定义的特征。迄今为止,提出了许多基于机器学习的方法来仅从蛋白质的氨基酸序列预测蛋白质的溶解度。尽管对此问题进行了20年的研究,但尚未对已发表的方法进行全面的审查。结果本文对现有模型进行了广泛的综述,以预测蛋白质在大肠杆菌重组蛋白过表达系统中的溶解度。针对所使用的数据集,特征,特征选择方法,机器学习技术和预测准确性,对模型进行了研究和比较。最后提供了有关模型的讨论。结论本研究旨在广泛研究基于机器学习的预测重组蛋白溶解度的方法,从而为该领域的研究提供一般性和详细的理解。一些模型提供可接受的预测性能和方便的用户界面。这些模型可以被认为是在进行真正的实验室实验之前预测重组蛋白过表达结果的有价值的工具,从而节省了人工,时间和成本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号