首页> 中文期刊>计算机工程与科学 >一种基于多阶畸变子模式相似性学习的序列识别方法

一种基于多阶畸变子模式相似性学习的序列识别方法

     

摘要

In the domain of sequence recognition, sequences with the same label are not rigorously similar because of the influence of many factors. Using multi-scale to measure the similarities between signature sequences is much helpful to obtaining highly-qualified similarity measures. A new method for sequence recognition based on distorted subsequence is put forward in this paper. A kernel function,which takes into account the distortions of various degrees, is defined on the feature space spanned by the distorted subsequences, and an efficient algorithm of linear cost is designed to compute the feature vectors with high dimensions. A combination of the kernel matrix with different distortions is learned and optimized through Semidefinite Program (SDP). Combining the optimized kernel with Support Vector Machine (SVM), a classifier with softer boundary that allows the most appropriate degree of distortions within the sequences is built. The experiments on the benchmark database of SCOP 1.37 PDB90 show that the classifier improves the recognition accuracy universally for most protein sequences in the 33 families of the benchmark database.%序列识别研究对于诸多应用研究领域有重要的意义.在序列识别中,由于多种因素的影响,同一类别标记的序列往往不具有严格的相似性.变化序列相似性描述的尺度对序列的相似性进行描述有利于获得更准确的序列相似性描述结果,为此提出了基于多阶畸变序列子模式的序列识别方法.通过定义序列多阶畸变子模式特征空间及其核变换函数,设计线性开销算法有效实现了序列畸变子模式高维特征向量的计算,进而利用半定规划对多阶畸变序列子模式的核变换矩阵进行优化.基于多阶畸变子模式相似性描述优化结果,支持向量机生成的识别方法比较好地适应了序列之间的不同程度的相似性畸变,而且具有柔性边界特征.本方法在蛋白质基准数据SCOP 1.37 PDB90上进行了实验,普遍提高了该数据集上33个不同家族蛋白质序列的识别结果.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号