首页> 外文学位 >Enhancing protein fold prediction accuracy using new physicochemical-based features and fusion of heterogeneous classifiers.
【24h】

Enhancing protein fold prediction accuracy using new physicochemical-based features and fusion of heterogeneous classifiers.

机译:使用新的基于物理化学的特征和异质分类器的融合来提高蛋白质折叠预测的准确性。

获取原文
获取原文并翻译 | 示例

摘要

One of the most challenging research areas in the bioinformatics is to predict the tertiary structure of a protein from its amino acid sequence. Difficulties of this task, such as lack of knowledge about the protein structural stability or how the amino acids interact with each other along the amino acid sequence of a protein have made this an open research issue for the bioinformatics and the molecular biology.;Recently, due to tremendous advancement in Pattern Recognition, Machine Learning, and Artificial Intelligent (AI) fields, there has been a great interest to apply intelligent approaches to tackle the protein fold prediction problem. To enhance the protein fold prediction accuracy using the pattern recognition-based approaches, the prediction performance of the applied classifier, discriminatory information of the extracted features, and compatibility of the applied classifier and extracted features should be considered. In this research we aim at solving the protein fold prediction problem using the pattern recognition-based approaches such as using fusion methods and extracting new physicochemical-based features.;In this study, in order to explore the prediction performance of different classifiers for the protein fold prediction task, a comparison study of seven classifiers namely: Multi Layer Perceptron (MLP), Support Vector Machine (SVM), K-Nearest Neighbor, C4.5, Naive Bayes, AdaBoost.M1, and LogitBoost have been conducted. The applied classifiers have been chosen based on their popularity and their results achieved in previous works.;Based on the finding from our comparison study, new fusion of heterogeneous classifiers (AdaBoost.M1, LogitBoost, Naive Bayes, MLP and SVM) has been proposed to tackle this problem. The proposed method aims at enhancing the protein fold prediction accuracy by employing the discriminatory ability of different classifiers (diversity among classifier ensemble) to enhance the general performance of the new classifier instead of using strength of an individual classifier. To the best of our knowledge, the proposed method enhances the protein fold prediction accuracy as compared to the other studies found in the literature.;In continuation, two Meta classifiers namely: Rotation Forest and Random Forest classifiers have also been employed to tackle the protein fold prediction problem. Our experimental results showed that the applied methods outperformed most of the works found in the literature as well as reducing time consumption of this task.;To explore the discriminatory power of features, new feature groups have been extracted based on the physical and physicochemical properties of the amino acids. The effectiveness of the extracted feature groups have been studied using three most popular classifiers that consistently perform better than other employed classifiers (MLP, SVM, and AdaBoost.M1). The achieved results show that the extracted features are more effective than other features that have been proposed by previous works considering the number of features.;Finally, our proposed method has been applied to different combinations of our extracted features to investigate the compatibility of the proposed classifier and extracted features. Our experimental results show that using the proposed method with the combination of the new features enhance the protein fold prediction accuracy better than using each of them individually. The proposed approaches also showed lower time consumption considering their prediction performance compared to the other methods have been used to tackle the protein fold prediction problem.;In this study, a new fusion of heterogeneous classifiers and new physicochemical-based features have been proposed to tackle the protein fold prediction problem. The proposed approaches enhance the prediction performance of this task for two most popular benchmarks that have been widely used in previous works.
机译:生物信息学中最具挑战性的研究领域之一是从其氨基酸序列预测蛋白质的三级结构。这项工作的困难,例如缺乏对蛋白质结构稳定性的了解,或者缺乏氨基酸沿着蛋白质的氨基酸序列如何相互作用的知识,使得这成为生物信息学和分子生物学研究的一个开放课题。由于模式识别,机器学习和人工智能(AI)领域的巨大进步,人们非常关注应用智能方法来解决蛋白质折叠预测问题。为了使用基于模式识别的方法提高蛋白质折叠预测的准确性,应考虑应用分类器的预测性能,所提取特征的区分信息以及所应用分类器和所提取特征的兼容性。在本研究中,我们旨在使用基于模式识别的方法(例如,使用融合方法并提取基于物理化学的新特征)来解决蛋白质折叠预测问题。在本研究中,为了探索不同分类器对蛋白质的预测性能折叠预测任务,对七个分类器进行了比较研究:多层感知器(MLP),支持向量机(SVM),K最近邻,C4.5,朴素贝叶斯,AdaBoost.M1和LogitBoost。根据应用的分类器的受欢迎程度和在以前的工作中获得的结果来选择分类器。基于我们的比较研究,提出了新的异构分类器融合(AdaBoost.M1,LogitBoost,Naive Bayes,MLP和SVM)解决这个问题。所提出的方法旨在通过利用不同分类器的区分能力(分类器集合之间的多样性)来增强新分类器的一般性能,而不是利用单个分类器的强度来提高蛋白质折叠预测的准确性。据我们所知,与文献中的其他研究相比,提出的方法提高了蛋白质折叠预测的准确性。;继续,还使用了两个元分类器:Rotation Forest和Random Forest分类器来处理蛋白质。折叠预测问题。我们的实验结果表明,所应用的方法优于文献中发现的大多数工作,并且减少了此任务的时间消耗。;为探究特征的歧视性,基于特征的物理和理化特性提取了新的特征组氨基酸。已使用三个最受欢迎的分类器研究了提取的特征组的有效性,该分类器的性能始终优于其他使用的分类器(MLP,SVM和AdaBoost.M1)。取得的结果表明,考虑到特征的数量,所提取的特征比以前的工作提出的其他特征更为有效。最后,将本文提出的方法应用于提取特征的不同组合,以研究所提出特征的兼容性。分类器和提取的特征。我们的实验结果表明,与单独使用每种方法相比,将所提出的方法与新功能结合使用可以更好地提高蛋白质折叠预测的准确性。与其他解决蛋白质折叠预测问题的方法相比,该方法考虑到其预测性能还显示出较低的时间消耗。;本研究中,提出了一种新的融合异质分类器和基于理化特征的新方法蛋白质折叠预测问题。对于在先前工作中广泛使用的两个最流行的基准,所提出的方法提高了该任务的预测性能。

著录项

  • 作者

    Dehzangi, Abdollah.;

  • 作者单位

    Multimedia University (Malaysia).;

  • 授予单位 Multimedia University (Malaysia).;
  • 学科 Biogeochemistry.;Information Technology.;Biology Bioinformatics.
  • 学位 M.Sc.
  • 年度 2011
  • 页码 137 p.
  • 总页数 137
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号