...
首页> 外文期刊>Journal of Molecular Biology >Identification of DNA-binding proteins using structural, electrostatic and evolutionary features.
【24h】

Identification of DNA-binding proteins using structural, electrostatic and evolutionary features.

机译:利用结构,静电和进化特征鉴定DNA结合蛋白。

获取原文
获取原文并翻译 | 示例

摘要

DNA-binding proteins (DBPs) participate in various crucial processes in the life-cycle of the cells, and the identification and characterization of these proteins is of great importance. We present here a random forests classifier for identifying DBPs among proteins with known 3D structures. First, clusters of evolutionarily conserved regions (patches) on the surface of proteins were detected using the PatchFinder algorithm; earlier studies showed that these regions are typically the functionally important regions of proteins. Next, we trained a classifier using features like the electrostatic potential, cluster-based amino acid conservation patterns and the secondary structure content of the patches, as well as features of the whole protein, including its dipole moment. Using 10-fold cross-validation on a dataset of 138 DBPs and 110 proteins that do not bind DNA, the classifier achieved a sensitivity and a specificity of 0.90, which is overall better than the performance of published methods. Furthermore, when we tested five different methods on 11 new DBPs that did not appear in the original dataset, only our method annotated all correctly. The resulting classifier was applied to a collection of 757 proteins of known structure and unknown function. Of these proteins, 218 were predicted to bind DNA, and we anticipate that some of them interact with DNA using new structural motifs. The use of complementary computational tools supports the notion that at least some of them do bind DNA.
机译:DNA结合蛋白(DBP)参与了细胞生命周期中的各种关键过程,这些蛋白的鉴定和表征非常重要。我们在这里提出一个随机森林分类器,用于识别具有已知3D结构的蛋白质之间的DBP。首先,使用PatchFinder算法检测蛋白质表面进化保守区域(斑块)的簇;早期的研究表明,这些区域通常是蛋白质的功能重要区域。接下来,我们使用诸如静电势,基于簇的氨基酸保守模式和补丁的二级结构含量以及整个蛋白质(包括其偶极矩)的特征来训练分类器。通过对不结合DNA的138个DBP和110种蛋白质的数据集进行10倍交叉验证,该分类器获得了0.90的灵敏度和特异性,总体上优于已发表方法的性能。此外,当我们在11个新DBP上测试了5种不同的方法而这些方法没有出现在原始数据集中时,只有我们的方法正确地注释了全部。所得分类器应用于757种已知结构和功能未知的蛋白质。在这些蛋白质中,预计有218种会与DNA结合,我们预计其中一些会使用新的结构基序与DNA相互作用。互补计算工具的使用支持了这样一种观念,即至少其中一些确实结合了DNA。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号