PROFILE-BASED STRING KERNELS FOR REMOTE HOMOLOGY DETECTION AND MOTIF EXTRACTION

Rui Kuang; Eugene Ie; Ke Wang; Kai Wang; Mahira Siddiqi; Yoav Freund; Christina Leslie

首页> 外文期刊>Journal of Bioinformatics and Computational Biology >PROFILE-BASED STRING KERNELS FOR REMOTE HOMOLOGY DETECTION AND MOTIF EXTRACTION

【24h】

PROFILE-BASED STRING KERNELS FOR REMOTE HOMOLOGY DETECTION AND MOTIF EXTRACTION

机译：基于概要文件的字符串内核，用于远程同源性检测和motif提取

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We introduce novel profile-based string kernels for use with support vector machines (SVMs) for the problems of protein classification and remote homology detection. These kernels use probabilistic profiles, such as those produced by the PSI-BLAST algorithm, to define position-dependent mutation neighborhoods along protein sequences for inexact matching of k-length subsequences ("k-mers") in the data. By use of an efficient data structure, the kernels are fast to compute once the profiles have been obtained. For example, the time needed to run PSI-BLAST in order to build the profiles is significantly longer than both the kernel computation time and the SVM training time. We present remote homology detection experiments based on the SCOP database where we show that profile-based string kernels used with SVM classifiers strongly outperform all recently presented supervised SVM methods. We further examine how to incorporate predicted secondary structure information into the profile kernel to obtain a small but significant performance improvement. We also show how we can use the learned SVM classifier to extract "discriminative sequence motifs" — short regions of the original profile that contribute almost all the weight of the SVM classification score — and show that these discriminative motifs correspond to meaningful structural features in the protein data. The use of PSI-BLAST profiles can be seen as a semi-supervised learning technique, since PSI-BLAST leverages unlabeled data from a large sequence database to build more informative profiles. Recently presented "cluster kernels" give general semi-supervised methods for improving SVM protein classification performance. We show that our profile kernel results also outperform cluster kernels while providing much better scalability to large datasets.

机译：我们介绍与支持向量机（SVM）一起使用的基于配置文件的新型字符串内核，用于解决蛋白质分类和远程同源性检测的问题。这些内核使用概率分布图（例如由PSI-BLAST算法生成的分布图）来定义沿蛋白质序列的位置相关的突变邻域，以实现数据中k长度子序列（“ k-mers”）的不精确匹配。通过使用有效的数据结构，一旦获得了配置文件，内核即可快速进行计算。例如，运行PSI-BLAST以构建配置文件所需的时间明显长于内核计算时间和SVM训练时间。我们提出了基于SCOP数据库的远程同源性检测实验，在该实验中，我们证明了与SVM分类器一起使用的基于配置文件的字符串内核大大优于最近提出的所有受监督SVM方法。我们进一步研究了如何将预测的二级结构信息合并到配置文件内核中，以实现较小但显着的性能改进。我们还展示了如何使用学习到的SVM分类器提取“区分性序列基序”（原始轮廓的短区域几乎贡献了SVM分类分数的所有权重），并说明了这些区分性基序对应于SVM分类中有意义的结构特征。蛋白质数据。 PSI-BLAST配置文件的使用可以看作是一种半监督学习技术，因为PSI-BLAST利用来自大型序列数据库的未标记数据来构建更多信息的配置文件。最近提出的“簇核”给出了改善SVM蛋白质分类性能的一般半监督方法。我们表明，我们的概要文件内核结果也优于群集内核，同时为大型数据集提供了更好的可伸缩性。

著录项

来源
《Journal of Bioinformatics and Computational Biology》 |2005年第3期|共24页
作者
Rui Kuang; Eugene Ie; Ke Wang; Kai Wang; Mahira Siddiqi; Yoav Freund; Christina Leslie;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类细胞生物学;
关键词
Protein classification; support vector machine; kernels; protein motifs;

机译：蛋白质分类;支持向量机;内核;蛋白质基序;

相似文献

外文文献
中文文献
专利

1. PROFILE-BASED STRING KERNELS FOR REMOTE HOMOLOGY DETECTION AND MOTIF EXTRACTION [J] . Rui Kuang, Eugene Ie, Ke Wang, Journal of Bioinformatics and Computational Biology . 2005,第3期

机译：基于概要文件的字符串内核，用于远程同源性检测和motif提取
2. Profile-based direct kernels for remote homology detection and fold recognition [J] . Huzefa Rangwala, George Karypis Bioinformatics . 2005,第23期

机译：基于配置文件的直接核用于远程同源性检测和折叠识别
3. Motif kernel generated by genetic programming improves remote homology and fold detection [J] . Tony H?ndstad, Arne JH Hestnes, P?l S?trom BMC Bioinformatics . 2007,第1期

机译：由遗传编程生成的主题内核改善了远程同源性和折叠检测
4. Profile-based string kernels for remote homology detection and motif extraction [C] . Rui Kuang, Ie, E., . 2004

机译：基于配置文件的字符串内核，用于远程同源性检测和基序提取
5. Remote Homology Detection in Proteins Using Graphical Models. [D] . Daniels, Noah Manus. 2013

机译：使用图形模型对蛋白质进行远程同源性检测。
6. Motif kernel generated by genetic programming improves remote homology and fold detection [O] . Tony Håndstad, Arne JH Hestnes, Pål Sætrom 2007

机译：通过遗传编程生成的母题内核改善了远程同源性和折叠检测
7. Profile-based string kernels for remote homology detection and motif extraction [O] . Rui Kuang, Eugene Ie, Kai Wang, 2004

机译：基于配置文件的字符串内核，用于远程同源性检测和基序提取
8. Profile Based Direct Kernels for Remote Homology Detection and Fold Recognition. [R] . Rangwala, H., Karypis, G. 2005

机译：基于轮廓的直接核用于远程同源检测和折叠识别。

PROFILE-BASED STRING KERNELS FOR REMOTE HOMOLOGY DETECTION AND MOTIF EXTRACTION

摘要

著录项

相似文献

相关主题

期刊订阅