...
首页> 外文期刊>IEEE/ACM transactions on computational biology and bioinformatics >Deep Robust Framework for Protein Function Prediction Using Variable-Length Protein Sequences
【24h】

Deep Robust Framework for Protein Function Prediction Using Variable-Length Protein Sequences

机译:使用可变长度蛋白序列的蛋白质功能预测的深度鲁棒框架

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The order of amino acids in a protein sequence enables the protein to acquire a conformation suitable for performing functions, thereby motivating the need to analyze these sequences for predicting functions. Although machine learning based approaches are fast compared to methods using BLAST, FASTA, etc., they fail to perform well for long protein sequences (with more than 300 amino acids). In this paper, we introduce a novel method for construction of two separate feature sets for protein using bi-directional long short-term memory network based on the analysis of fixed 1) single-sized segments and 2) multi-sized segments. The model trained on the proposed feature set based on multi-sized segments is combined with the model trained using state-of-the-art Multi-label Linear Discriminant Analysis (MLDA) features to further improve the accuracy. Extensive evaluations using separate datasets for biological processes and molecular functions demonstrate not only improved results for long sequences, but also significantly improve the overall accuracy over state-of-the-art method. The single-sized approach produces an improvement of +3.37 percent for biological processes and +5.48 percent for molecular functions over the MLDA based classifier. The corresponding numbers for multi-sized approach are +5.38 and +8.00 percent. Combining the two models, the accuracy further improves to +7.41 and +9.21 percent, respectively.
机译:蛋白质序列中的氨基酸的顺序使得蛋白质能够获取适合于执行功能的构象,从而激励分析这些序列以预测功能。尽管基于机器学习的方法快速与使用爆炸,Fasta等的方法相比,但它们不能对长蛋白质序列(具有300多个氨基酸)表现良好。在本文中,我们基于固定1)单尺寸段的分析和2)多尺寸段,介绍一种使用双向长短期存储网络构建两种单独特征集的一种新方法。基于多尺寸段的所提出的特征集训练的模型与使用最先进的多标签线性判别分析(MLDA)特征训练的模型相结合,以进一步提高准确性。使用单独数据集进行生物过程和分子函数的广泛评估不仅表明了长序列的改善结果,而且显着提高了最先进的方法的整体精度。单尺寸的方法在MLDA基于MLDA的分类器上产生+ 3.37%的改善+ 3.37%,用于分子函数的+ 5.48%。多大型方法的相应数字是+ 5.38 + +8.00%。结合两种型号,精度进一步改善了+7.41和+ 9.21%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号