首页> 外文会议>2015 International Conference on Futuristic trend on Computational Analysis and Knowledge Management >Classification of enzyme functional classes and subclasses using support vector machine
【24h】

Classification of enzyme functional classes and subclasses using support vector machine

机译:使用支持向量机对酶功能类和亚类进行分类

获取原文
获取原文并翻译 | 示例

摘要

Enzymes play an important role in metabolism that helps in catalyzing bio-chemical reactions. Predicting functions of enzymes by experiments is costly and time consuming. Hence a computational method is required to predict the function of enzymes. This paper presents a supervised machine learning approach to predict the functional classes and subclass of protein sequences including enzymes and non-enzymes based on 857 sequence derived features. This paper used seven sequence derived properties including amino acid composition, dipeptide composition, correlation feature, composition, transition, distribution and pseudo amino acid composition. We have used recursive feature elimination technique (RFE), in order to select optimal number of features. The support vector machine (SVM) has been used to construct a three level model with optimal number of features selected by SVM-RFE, where top (first) level distinguish a query protein as an enzyme or nonenzyme, the next (second) level predicts the enzyme functional class and the last (third) level predict the subfunctional class. The proposed model reported overall accuracy of 97.6%, precision of 97.8%and Matthew Correlation Coefficient (MCC) value of 0.93 for the first level, whereas accuracy of 87.3%, precision of 87.7% and MCC value of 0.84 for second level and accuracy of 85.6%, precision of 87.9% and MCC value of 0.86 for the third level.
机译:酶在代谢中起重要作用,有助于催化生化反应。通过实验预测酶的功能既昂贵又费时。因此,需要一种计算方法来预测酶的功能。本文提出了一种有监督的机器学习方法,可基于857个序列派生的特征预测蛋白质序列的功能类别和亚类,包括酶和非酶。本文使用了七个序列衍生的特性,包括氨基酸组成,二肽组成,相关特征,组成,过渡,分布和伪氨基酸组成。为了选择最佳数量的特征,我们使用了递归特征消除技术(RFE)。支持向量机(SVM)已被用于构建具有SVM-RFE选择的最佳特征数量的三级模型,其中最高(第一)级将查询蛋白区分为酶或非酶,下一(第二)级预测酶功能类别和最后一个(第三)水平预测了亚功能类别。拟议的模型报告第一级的总体准确度为97.6%,准确度为97.8%,马修相关系数(MCC)值为0.93,而第二级的准确度为87.3%,准确度为87.7%,MCC值为0.84,第三级为85.6%,精度为87.9%,MCC值为0.86。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号