首页> 外文期刊>Molecular BioSystems >Discrimination of soluble and aggregation-prone proteins based on sequence information
【24h】

Discrimination of soluble and aggregation-prone proteins based on sequence information

机译:根据序列信息区分可溶性和易于聚集的蛋白质

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Understanding the factors governing protein solubility is a key to grasp the mechanisms of protein solubility and may provide insight into protein aggregation and misfolding related diseases such as Alzheimer's disease. In this work, we attempt to identify factors important to protein solubility using feature selection. Firstly, we calculate 1438 features including physicochemical properties and statistics for each protein. Random Forest algorithm is used to select the most informative and the minimal subset of features based on their predictive performance. A predictive model is built based on 17 selected features. Compared with previous models, our model achieves better performance with a sensitivity of 0.82, specificity 0.85, ACC 0.84, AUC 0.91 and MCC 0.67. Furthermore, a model using a redundancy-reduced dataset (sequence identity <= 30%) achieves the same performance as the model without redundancy reduction. Our results provide not only a reliable model for predicting protein solubility but also a list of features important to protein solubility. The predictive model is implemented as a freely available web application at http://shark.abl.ku.edu/ProS/.
机译:了解控制蛋白质溶解度的因素是掌握蛋白质溶解度机制的关键,并且可以提供对蛋白质聚集和与错误折叠有关的疾病(例如阿尔茨海默氏病)的见识。在这项工作中,我们尝试使用特征选择来确定对蛋白质溶解度重要的因素。首先,我们计算了1438个特征,包括每种蛋白质的理化性质和统计数据。随机森林算法用于根据特征的预测性能选择信息量最大和最小的子集。基于17个选定特征构建了预测模型。与以前的模型相比,我们的模型具有更好的性能,灵敏度为0.82,特异性为0.85,ACC为0.84,AUC为0.91,MCC为0.67。此外,使用减少冗余的数据集(序列同一性<= 30%)的模型可实现与模型相同的性能,而不会减少冗余。我们的结果不仅提供了预测蛋白质溶解度的可靠模型,而且还提供了对蛋白质溶解度重要的一系列功能。可在http://shark.abl.ku.edu/ProS/上将其作为免费提供的Web应用程序来实现。

著录项

  • 来源
    《Molecular BioSystems》 |2013年第4期|806-811|共6页
  • 作者

    Yaping Fang; Jianwen Fang;

  • 作者单位

    Applied Bioinformatics Laboratory, The University of Kansas, 2034 Becker Dr., Lawrence, Kansas 66047, USA;

    Applied Bioinformatics Laboratory, The University of Kansas, 2034 Becker Dr., Lawrence, Kansas 66047, USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号