首页> 外文会议>International Conference on Practical Applications of Computational Biology Bioinformatics >Moment Vector Encoding of Protein Sequences for Supervised Classification
【24h】

Moment Vector Encoding of Protein Sequences for Supervised Classification

机译:监督分类蛋白序列的瞬间载体编码

获取原文

摘要

Automated prediction of biological attributes of protein sequences with machine learning methods depends on a well-suited protein representation. A central challenge is to represent variable-length sequences as fixed-length feature vectors. In this paper we introduce a new approach for representing the protein sequences as a fixed length vector based on statistical moments applied directly to the values of physicochemical properties of amino acids. The results show that this approach of encoding gives higher prediction accuracy on four benchmarks compared to the previous approaches that applied moments of complex descriptors extracted from the physicochemical properties, and even better than the PseAAC encoding method. The best results are achieved by removing highly correlated features with principal component analysis.
机译:用机器学习方法自动预测蛋白质序列的生物学属性取决于良好的蛋白质代表性。中央挑战是将可变长度序列代表为固定长度特征向量。在本文中,我们介绍一种基于直接施加到氨基酸物理化学性质的值的统计矩作为固定长度载体代表作为固定长度载体的蛋白质序列的新方法。结果表明,与从物理化学特性提取的复杂描述符的瞬间相比,这种编码方法在四个基准上提供了更高的预测精度,甚至比PSEAAC编码方法更好。通过用主成分分析去除高度相关的特征来实现最佳结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号