首页> 外文会议>2015 International Symposium on Mathematical Sciences and Computing Research >Optimized tree-classification algorithm for classification of protein sequences
【24h】

Optimized tree-classification algorithm for classification of protein sequences

机译:用于蛋白质序列分类的优化树分类算法

获取原文
获取原文并翻译 | 示例

摘要

Computational intelligence is an ongoing area of research, which has been successfully utilized in the analysis and modeling of the tremendous amount of biological data accumulated under different high throughput genome sequencing projects. The data gathered is mainly comprised of DNA, RNA and protein sequences, which are imprecise, incomplete and increasing exponentially. Classification of protein sequences into different superfamilies could be helpful for knowing the structure/function or hidden characteristics of an unknown protein sequence. The problem of classifying protein sequences based on the primary sequence information is a very complex and challenging task in the analysis and understanding of sequenced data. The existing classification methods are performing well on a very limited data; however the rapid increase in the genomic data leads to the development of improved computational methods. In this work, we have proposed an optimized tree-classification technique which uses cluster k nearest neighbor classification algorithm to classify protein sequences into superfamilies. The proposed technique is alignment free and the experimental results reveal that it outperforms than the previous state-of-the-art approaches. The overall best classification accuracy achieved is 97-98% on the previously utilized dataset, which is taken from the well-known UniProtKB database.
机译:计算智能是一个正在进行的研究领域,已成功地用于分析和建模不同高通量基因组测序项目下积累的大量生物数据。收集的数据主要由DNA,RNA和蛋白质序列组成,它们不精确,不完整且呈指数增长。将蛋白质序列分为不同的超家族可能有助于了解未知蛋白质序列的结构/功能或隐藏特征。在分析和理解测序数据中,基于一级序列信息对蛋白质序列进行分类的问题是非常复杂且具有挑战性的任务。现有的分类方法在非常有限的数据上表现良好;然而,基因组数据的迅速增加导致改进的计算方法的发展。在这项工作中,我们提出了一种优化的树分类技术,该技术使用聚类k最近邻分类算法将蛋白质序列分类为超家族。所提出的技术是无对准的,并且实验结果表明它比以前的最新技术要好。在以前使用的数据集上,从众所周知的UniProtKB数据库中获得的总体最佳分类精度为97-98%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号