【24h】

Fast Classification of Protein Structures by an Alignment-Free Kernel

机译:通过无比对核对蛋白质结构进行快速分类

获取原文

摘要

Alignment is the most fundamental algorithm that has been widely used in numerous research in bioinformatics, but its computation cost becomes too expensive in various modern problems because of the recent explosive data growth. Hence the development of alignment-free algorithms, i.e., alternative algorithms that avoid the computationally expensive alignment, has become one of the recent hot topics in algorithmic bioinformatics. Analysis of protein structures is a very important problem in bioinformatics. We focus on the problem of predicting functions of proteins from their structures, as the functions of proteins are the keys of everything in the understandings of any organisms and moreover these functions are said to be determined by their structures. But the previous best-known (i.e., the most accurate) method for this problem utilizes alignment-based kernel method, which suffers from the high computation cost of alignments. For the problem, we propose a new kernel method that does not employ alignments. Instead of alignments, we apply the two-dimensional suffix tree and the contact map graph to reduce kernel-related computation cost dramatically. Experiments show that, compared to the previous best algorithm, our new method runs about 16 times faster in training and about 37 times faster in prediction while preserving comparatively high accuracy.
机译:对准是最基本的算法,已在生物信息学的众多研究中广泛使用,但是由于最近爆炸性的数据增长,它的计算成本在各种现代问题中变得过于昂贵。因此,无比对算法的发展,即避免计算上昂贵的比对的替代算法,已成为算法生物信息学中的近期热门话题之一。蛋白质结构分析是生物信息学中一个非常重要的问题。我们着重于从蛋白质的结构预测蛋白质功能的问题,因为蛋白质的功能是任何生物体理解中一切的关键,而且据说这些功能是由蛋白质的结构决定的。但是,针对该问题的先前最著名的(即,最准确的)方法利用基于比对的核方法,该方法遭受比对的高计算成本。针对该问题,我们提出了一种不使用对齐方式的新内核方法。代替对齐方式,我们应用二维后缀树和联系映射图来显着降低与内核相关的计算成本。实验表明,与以前的最佳算法相比,我们的新方法在训练时的运行速度快约16倍,在预测时的运行速度约快37倍,同时保持了较高的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号