首页> 外文会议>International Conference on Information Technology and Applications in Biomedicine >Weighted Amino Acid Composition based on Amino Acid Indices for Prediction of Protein Structural Classes
【24h】

Weighted Amino Acid Composition based on Amino Acid Indices for Prediction of Protein Structural Classes

机译:基于氨基酸指标的加权氨基酸组成,用于预测蛋白质结构类

获取原文

摘要

Prediction of protein structural classes is one of the most important and challenging tasks in the bioinformatics field. A protein is classified into one of the four main types of protein structural classes; all-α, all-β, α/β and α+β. This paper investigates the role of amino acid indices (AAI) combined with traditional amino acid composition (AAC) to create a weighted amino acid composition (WAAC) feature-set to predict the structural class of a protein. There are over 500 amino acid indices that can be used to develop the novel weighted amino acid composition feature-set which has a great potential of increasing accuracy for the prediction of protein structural classes. For evaluation of these indices a high quality 40% homology dataset is used that contains over 7000 protein sequences (the largest of its kind) extracted from proteomic databases. The predictive technique developed is an optimum k-nearest-neighbour classifier, named multiple-k-nearest-neighbour (MKNN). In order to evaluate the classifier a 10-fold cross-validation test procedure is used throughout the study. Over 1 million analyses were carried out, the highest accuracy obtained was from index LEVM780101 at 48.35%, which is 9% higher than traditional AAC and 6.6% higher than that of the best sequence-driven-feature sub-set used in other studies. There is great potential for further improvement as WAAC is a feature-set with the least number of attributes without any feature selection and the numbers of indices that yielded higher accuracies than traditional AAC and other sequence-driven-features are 536 and 435, respectively, out of the 548 amino acid indices analysed in this study.
机译:蛋白质结构类的预测是生物信息学领域中最重要和最具挑战性的任务之一。将蛋白质分为四种主要类型的蛋白质结构类中的一种;全-α,全-β,α/β和α+β。本文研究了氨基酸索引(AAI)与传统氨基酸组合物(AAC)结合的作用,以产生加权氨基酸组合物(WAAC)特征,以预测蛋白质的结构类。有超过500种氨基酸索引,可用于开发新的加权氨基酸组合物特征集,其具有较高的蛋白质结构类预测的准确性的巨大潜力。对于这些索引的评估,使用高质量的40%同源性数据集,其含有超过7000种蛋白质序列(其种类的最大)。推出的预测技术是一个最佳的K-Cirelte邻分类,名为Multiply-K-Cirossible邻(MKNN)。为了评估分类器,在整个研究中使用10倍的交叉验证测试程序。进行了超过100万分析,获得的最高精度是从指数levm780101的48.35%,比传统AAC高9%,比其他研究中使用的最佳序列驱动特征子集高6.6%。由于WAAC是具有最少的属性的特征集,因此有没有任何特征选择的特征设置,并且分别产生比传统AAC和其他序列驱动 - 特征更高的准确度的指数的索引数分别为536和435,因此在本研究中分析的548个氨基酸索引中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号