首页> 外文期刊>Neurocomputing >Using a low correlation high orthogonality feature set and machine learning methods to identify plant pentatricopeptide repeat coding gene/protein
【24h】

Using a low correlation high orthogonality feature set and machine learning methods to identify plant pentatricopeptide repeat coding gene/protein

机译:使用低相关高正交性功能集和机器学习方法来识别植物五肽重复编码基因/蛋白质

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: Identifying whether a pentatricopeptide repeat (PPR) exists in an amino acid is a significant task in the field of bioinformatics. To address this problem, an identification method that combines an optimal feature set selection framework and machine learning algorithms is proposed to recognize the PPR coding genes and proteins in the sequence of amino acid. The original 188-dimensional (D) features are obtained using a feature extraction method, which is successively optimised through a covariance analysis, max-relevant-max-distance processing, and principal component analysis to reduce it to an optimal feature set that has fewer but more expressive features. Four machine learning methods are then used to serve as the classifiers for the identification task.Results: The final number of feature data dimensions is reduced from 188 to only 10, and according to the experimental results from support vector machine methods, the loss of the AUC and the F-1 values are only 3.26% and 10.1%, respectively. Moreover, after applying the J48, random forest, and naive Bayes methods as classifiers, it was also found that the optimal feature set with 10 dimensions has an almost equivalent performance for a 10-fold validation test. (c) 2020 Elsevier B.V. All rights reserved.
机译:动机:鉴定氨基酸中是否存在戊庚二肽重复(PPR)是生物信息学领域的重要任务。为了解决这个问题,提出了一种结合最佳特征设置选择框架和机器学习算法的识别方法,以识别PPR编码基因和氨基酸序列中的蛋白质。使用特征提取方法获得原始的188维(D)特征,该特征提取方法通过协方差分析,最大相关 - 距离处理和主成分分析来连续优化,以将其降低到具有更少的最佳特征集但更具表现力的特征。然后使用四种机器学习方法用作识别任务的分类器。结果:特征数据尺寸的最终数量从188降至仅10,并且根据支持向量机方法的实验结果,失去了AUC和F-1分别仅为3.26%和10.1%。此外,在将J48,随机森林和幼稚贝叶斯方法应用于分类器之后,还发现,具有10个尺寸的最佳特征在10倍验证测试中具有几乎等效的性能。 (c)2020 Elsevier B.v.保留所有权利。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号