...
首页> 外文期刊>Bioinformatics >Classifying G-protein coupled receptors with support vector machines
【24h】

Classifying G-protein coupled receptors with support vector machines

机译:用支持向量机对G蛋白偶联受体进行分类

获取原文
获取原文并翻译 | 示例
           

摘要

Motivation: The enormous amount of protein sequence data uncovered by genome research has increased the demand for computer software that can automate the recognition of new proteins. We discuss the relative merits of various automated methods for recognizing G-Protein Coupled Receptors (GPCRs), a superfamily of cell membrane proteins. GPCRs are found in a wide range of organisms and are central to a cellular signalling network that regulates many basic physiological processes. They are the focus of a significant amount of current pharmaceutical research because they play a key role in many diseases. However, their tertiary structures remain largely unsolved. The methods described in this paper use only primary sequence information to make their predictions. We compare a simple nearest neighbor approach (BLAST), methods based on multiple alignments generated by a statistical profile Hidden Markov Model (HMM), and methods, including Support Vector Machines (SVMs), that transform protein sequences into fixed-length feature vectors. Results: The last is the most computationally expensive method, but our experiments show that, for those interested in annotation-quality classification, the results are worth the effort. In two-fold cross-validation experiments testing recognition of GPCR subfamilies that bind a specific ligand (such as a histamine molecule), the errors per sequence at the Minimum Error Point (MEP) were 13.7% for multi-class SVMs, 17.1% for our SVMtree method of hierarchical multi-class SVM classification, 25.5% for BLAST, 30% for profile HMMs, and 49% for classification based on nearest neighbor feature vector Kernel Nearest Neighbor (kernNN). The percentage of true positives recognized before the first false positive was 65% for both SVM methods, 13% for BLAST, 5%for profile HMMs and 4% for kernNN.
机译:动机:基因组研究发现的大量蛋白质序列数据增加了对可自动识别新蛋白质的计算机软件的需求。我们讨论了识别G蛋白偶联受体(GPCR),细胞膜蛋白的超家族的各种自动化方法的相对优点。 GPCR存在于多种生物中,并且是调节许多基本生理过程的细胞信号网络的核心。由于它们在许多疾病中起着关键作用,因此它们是当前大量药物研究的重点。但是,它们的三级结构仍未解决。本文介绍的方法仅使用主序列信息进行预测。我们比较了一种简单的最近邻方法(BLAST),基于统计配置文件隐马尔可夫模型(HMM)生成的多个比对的方法,以及将蛋白质序列转换为固定长度特征向量的方法,包括支持向量机(SVM)。结果:最后一种是计算上最昂贵的方法,但是我们的实验表明,对于那些对注释质量分类感兴趣的人,结果值得付出努力。在测试识别结合特定配体(例如组胺分子)的GPCR亚家族的双重交叉验证实验中,多类SVM在最小错误点(MEP)处每个序列的错误为13.7%,对于SVM为17.1%我们的SVMtree分层多类SVM分类方法,基于最近邻特征向量内核最近邻(kernNN),BLAST为25.5%,轮廓HMM为30%,分类为49%。两种支持向量机方法在第一个假阳性之前识别出的真实阳性百分比为65%,BLAST为13%,轮廓HMM为5%,kernNN为4%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号