首页> 美国卫生研究院文献>Journal of Applied Crystallography >Logistic regression models to predict solvent accessible residues using sequence- and homology-based qualitative and quantitative descriptors applied to a domain-complete X-ray structure learning set
【2h】

Logistic regression models to predict solvent accessible residues using sequence- and homology-based qualitative and quantitative descriptors applied to a domain-complete X-ray structure learning set

机译:使用基于序列和同源性的定性和定量描述符对域完整的X射线结构学习集进行逻辑回归模型预测溶剂可及残基

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

A working example of relative solvent accessibility (RSA) prediction for proteins is presented. Novel logistic regression models with various qualitative descriptors that include amino acid type and quantitative descriptors that include 20- and six-term sequence entropy have been built and validated. A domain-complete learning set of over 1300 proteins is used to fit initial models with various sequence homology descriptors as well as query residue qualitative descriptors. Homology descriptors are derived from BLASTp sequence alignments, whereas the RSA values are determined directly from the crystal structure. The logistic regression models are fitted using dichotomous responses indicating buried or accessible solvent, with binary classifications obtained from the RSA values. The fitted models determine binary predictions of residue solvent accessibility with accuracies comparable to other less computationally intensive methods using the standard RSA threshold criteria 20 and 25% as solvent accessible. When an additional non-homology descriptor describing Lobanov–Galzitskaya residue disorder propensity is included, incremental improvements in accuracy are achieved with 25% threshold accuracies of 76.12 and 74.79% for the Manesh-215 and CASP(8+9) test sets, respectively. Moreover, the described software and the accompanying learning and validation sets allow students and researchers to explore the utility of RSA prediction with simple, physically intuitive models in any number of related applications.
机译:给出了蛋白质相对溶剂可及性(RSA)预测的工作示例。建立并验证了具有各种定性描述符(包括氨基酸类型)和定量描述符(包括20和6项序列熵)的新型逻辑回归模型。超过1300种蛋白质的域完整学习集用于拟合具有各种序列同源性描述符以及查询残基定性描述符的初始模型。同源性描述符来自BLASTp序列比对,而RSA值直接从晶体结构确定。使用指示掩埋或可及溶剂的二分响应拟合Logistic回归模型,并根据RSA值获得二元分类。拟合模型使用标准RSA阈值标准20%和25%作为溶剂可及性,确定了残留溶剂可及性的二进制预测,其准确性可与其他计算量较少的方法相媲美。当包含描述Lobanov–Galzitskaya残基疾病倾向的其他非同源性描述符时,Manesh-215和CASP(8 + 9)测试集的25%阈值准确度分别为76.12和74.79%,从而实现了准确性的逐步提高。此外,所描述的软件以及随附的学习和验证集使学生和研究人员可以在许多相关应用中使用简单,直观的模型来探索RSA预测的效用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号