首页> 外文OA文献 >Protein secondary structure prediction from amino acid sequence using artificial intelligence technique
【2h】

Protein secondary structure prediction from amino acid sequence using artificial intelligence technique

机译:利用人工智能技术从氨基酸序列预测蛋白质二级结构

摘要

Large genome sequencing projects generate huge number of protein sequences in their primary structures that is difficult for conventional biological techniques to determine their corresponding 3D structures and then their functions. Protein secondary structure prediction is a prerequisite step in determining the 3D structure of a protein. In this research a method for prediction of protein secondary structure has been proposed and implemented together with other known accurate methods in this domain. The method has been discussed and presented in a comparative analysis progression to allow easy comparison and clear conclusions. A benchmark data set is exploited in training and testing the methods under the same hardware, platforms, and environments. The newly developed method utilizes the knowledge of the GORV information theory and the power of the neural network to classify a novel protein sequence in one of its three secondary structures classes. NN-GORV-I is developed and implemented to predict proteins secondary structure using the biological information conserved in neighboring residues and related sequences. The method is further improved by a filtering mechanism for the searched sequences to its advanced version NN-GORV-II. The newly developed method is rigorously tested together with the other methods and observed reaches the above 80% level of accuracy. The accuracy and quality of prediction of the newly developed method is superior to all the six methods developed or examined in this research work or that reported in this domain. The Mathews Correlation Coefficients (MCC) proved that NN-GORV-II secondary structure predicted states are highly related to the observed secondary structure states. The NN-GORV-II method is further tested using five DSSP reduction schemes and found stable and reliable in its prediction ability. An additional blind test of sequences that have not been used in the training and testing procedures is conducted and the experimental results show that the NN-GORV-II prediction is of high accuracy, quality, and stability. The Receiver Operating Characteristic (ROC) curve and the area under curve (AUC) are applied as novel procedures to assess a multi-class classifier with approximately 0.5 probability of one and only one class. The results of ROC and AUC prove that the NN-GOR-V-II successfully discriminates between two classes; coils and not-coils.
机译:大型基因组测序项目会在其一级结构中生成大量蛋白质序列,这对于常规生物学技术而言,很难确定其相应的3D结构及其功能。蛋白质二级结构预测是确定蛋白质3D结构的前提步骤。在这项研究中,已经提出了预测蛋白质二级结构的方法,并与该领域的其他已知准确方法一起实施。该方法已在比较分析过程中进行了讨论和介绍,可以轻松比较并得出清晰的结论。在相同的硬件,平台和环境下,使用基准数据集来训练和测试方法。新开发的方法利用GORV信息理论的知识和神经网络的功能将新的蛋白质序列分类为三个二级结构类别之一。使用邻近残基和相关序列中保守的生物学信息来开发和实施NN-GORV-1,以预测蛋白质的二级结构。该方法通过对搜索到的序列的高级版本NN-GORV-II的过滤机制进行了进一步改进。新开发的方法与其他方法一起经过严格测试,观察到的精度达到了80%以上。新开发的方法的预测准确性和质量优于在本研究工作中开发或检查的六种方法或在该领域报告的所有六种方法。 Mathews相关系数(MCC)证明NN-GORV-II的二级结构预测状态与观察到的二级结构状态高度相关。 NN-GORV-II方法使用五种DSSP缩减方案进行了进一步测试,发现其预测能力稳定可靠。对尚未在训练和测试过程中使用的序列进行了额外的盲法测试,实验结果表明,NN-GORV-II预测具有较高的准确性,质量和稳定性。接收器工作特征(ROC)曲线和曲线下面积(AUC)被用作新颖的程序,以大约0.5的概率仅对一个类别进行多类别分类器评估。 ROC和AUC的结果证明NN-GOR-V-II成功地区分了两个类别。线圈和非线圈。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号