首页> 外文会议>2011 Annual IEEE India Conference : Engineering Sustainable Solutions >Protein superfamily classification using Kernel Principal Component Analysis and Probabilistic Neural Networks
【24h】

Protein superfamily classification using Kernel Principal Component Analysis and Probabilistic Neural Networks

机译:基于核主成分分析和概率神经网络的蛋白质超家族分类

获取原文
获取原文并翻译 | 示例

摘要

This paper intends to implement Probabilistic Neural Network(PNN) for protein superfamily classification problem. The classification task organizes proteins into their superfamilies and helps in correct prediction of structure and function of newly discovered proteins. The two main steps for any pattern classification problem are feature selection and feature extraction. The bi-gram hashing function is used which extracts and counts the occurrences of bi-gram patterns from long strings of amino acid sequences. The bi-gram method maps sequences of different length into input vectors of same length, but the major drawback of this method is that, the size of the input feature vector tends to be very large. Selection of optimal number of features remains a critical issue for any pattern classification problem. Principal Component Analysis(PCA), a very powerful statistical technique, is used to reduce the dimension of the large input vector without much loss of information and thereby identifying pattern in data of high dimension. Traditional PCA makes a linear transformation wheras Kernel PCA(KPCA) is used when data are distributed nonlinearly. Numerical simulations have shown that for protein data distributed non-linearly, KPCA outperforms PCA in terms of accuracy, sensitivity and specificity.
机译:本文旨在为蛋白质超家族分类问题实现概率神经网络(PNN)。分类任务将蛋白质组织到其超家族中,并有助于正确预测新发现的蛋白质的结构和功能。解决任何模式分类问题的两个主要步骤是特征选择和特征提取。使用二元语法哈希函数,该函数从长序列的氨基酸序列中提取并计数二元语法模式的出现。二元语法方法将不同长度的序列映射到相同长度的输入向量中,但是这种方法的主要缺点是输入特征向量的大小往往非常大。对于任何模式分类问题,最佳特征数量的选择仍然是一个关键问题。主成分分析(PCA)是一种非常强大的统计技术,可用于在不损失大量信息的情况下减小大输入向量的维数,从而识别高维数据中的模式。传统的PCA进行线性变换,当数据非线性分布时使用内核PCA(KPCA)。数值模拟表明,对于非线性分布的蛋白质数据,KPCA在准确性,敏感性和特异性方面均优于PCA。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号