首页> 外文会议>Asia-Pacific Bioinformatics Conference >Predicting and analyzing DNA-binding domains using a systematic approach to identifying a set of informative physicochemical and biochemical properties
【24h】

Predicting and analyzing DNA-binding domains using a systematic approach to identifying a set of informative physicochemical and biochemical properties

机译:使用系统方法预测和分析DNA结合结构域,鉴定一组信息化学和生化特性

获取原文

摘要

Background: Existing methods of predicting DNA-binding proteins used valuable features of physicochemical properties to design support vector machine (5VM) based classifiers. Generally, selection of physicochemical properties and determination of their corresponding feature vectors rely mainly on known properties of binding mechanism and experience of designers. However, there exists a troublesome problem for designers that some different physicochemical properties have similar vectors of representing 20 amino acids and some closely related physicochemical properties have dissimilar vectors. Results: This study proposes a systematic approach (named Auto-IDPCPs) to automatically identify a set of physicochemical and biochemical properties in the AAindex database to design SVM-based classifiers for predicting and analyzing DNA-binding domains/proteins. Auto-IDPCPs consists of 1) clustering 531 amino acid indices in AAindex into 20 clusters using a fuzzy c-means algorithm, 2) utilizing an efficient genetic algorithm based optimization method IBCGA to select an informative feature set of size m to represent sequences, and 3) analyzing the selected features to identify related physicochemical properties which may affect the binding mechanism of DNA-binding domains/proteins. The proposed Auto-IDPCPs identified m=22 features of properties belonging to five clusters for predicting DNA-binding domains with a five-fold cross-validation accuracy of 87.12%, which is promising compared with the accuracy of 86.62% of the existing method PSSM-400. For predicting DNA-binding sequences, the accuracy of 75.50% was obtained using m=28 features, where PSSM-400 has an accuracy of 74.22%. Auto-IDPCPs and PSSM-400 have accuracies of 80.73% and 82.81%, respectively, applied to an independent test data set of DNA-binding domains. Some typical physicochemical properties discovered are hydrophobicity, secondary structure, charge, solvent accessibility, polarity, flexibility, normalized Van Der Waals volume, pK (pK-C, pK-N, pK-COOH and pK-a(RCOOH)), etc. Conclusions: The proposed approach Auto-IDPCPs would help designers to investigate informative physicochemical and biochemical properties by considering both prediction accuracy and analysis of binding mechanism simultaneously. The approach Auto-IDPCPs can be also applicable to predict and analyze other protein functions from sequences.
机译:背景:预测DNA结合蛋白的现有方法使用了物理化学性质的有价值的特征来设计支持向量机(5VM)的分类器。通常,选择物理化学特性和它们对应特征向量的测定主要依赖于设计者结合机构和经验的已知性质。然而,对于设计者存在麻烦的问题,即一些不同的物理化学性质具有代表20个氨基酸的类似载体,并且一些密切相关的物理化学性质具有异种载体。结果:本研究提出了一种系统的方法(命名为Auto-IDPCP),以自动识别AainDex数据库中的一组物理化学和生化特性,以设计基于SVM的分类器,用于预测和分析DNA结合结构域/蛋白质。 Auto-IDPCPS由1)聚类531在AINDEX中的531个氨基酸索引使用模糊C型算法,2)利用基于有效的基于遗传算法的优化方法IBCGA来选择大小M的信息集以表示序列,以及3)分析所选择的特征以鉴定相关的物理化学性质,其可能影响DNA结合结构域/蛋白质的结合机制。所提出的自动IDPCPS鉴定了属于五种簇的M = 22个特征,用于预测5倍的交叉验证精度为87.12%的DNA结合结构域,这与现有方法PSSM的86.62%的准确度相比是有希望的-400。为了预测DNA结合序列,使用M = 28个特征获得75.50%的精度,其中PSSM-400的精度为74.22%。自动IDPCP和PSSM-400分别具有80.73%和82.81%的精度,应用于DNA结合结构域的独立测试数据集。发现一些典型的物理化学特性是疏水性,二次结构,电荷,溶剂可用性,极性,柔韧性,归一化范德华载体积,PK(PK-C,PK-N,PK-COOH和PK-A(RCOOH))等。结论:拟议的方法自动IDPCPS将通过考虑同时预测精度和结合机制的分析来帮助设计人员调查信息化学和生化特性。该方法Auto-IDPCP也可以适用于预测和分析来自序列的其他蛋白质功能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号