首页> 外文会议>International Symposium on Health Informatics and Bioinformatics >Classification of cohesin family using class specific motifs
【24h】

Classification of cohesin family using class specific motifs

机译:使用类特定主题的Cohyin家族的分类

获取原文

摘要

Motif extraction from protein sequences has been a challenging task for bioinformaticians. Class-specific motifs, which are frequently found in one class but are in small ratio in other classes can be used for highly accurate classification of protein sequences. In this study, we present a new scoring based method for class-specific n-gram motif selection using reduced amino acid alphabets. Cohesin protein sequences, which interact with Dockerin modules to construct the most common and abundant organic polymer Cellulosome is used for class specific motif selection, and selected motifs are then given to J48 and SVM algorithms as features. Results of classification are examined with parameters of various n-gram sizes, reduced amino acid alphabets and feature number. Result with training accuracy of 98.61 % and test accuracy of 94.54 %, was found to be best one using Gbmr14 alphabet, 5 features per family, 4-gram motifs and J48 algorithm. The proposed technique can be generalized to use for other protein families.
机译:来自蛋白质序列的基序是生物信息管理员的挑战性任务。特定于一类中的特异性主题,但在其他类别中的比例小可用于高精度分类蛋白质序列。在这项研究中,我们介绍了一种基于新的基于次数的N-GR克基序选择的方法,所述氨基酸字母表的类别特异性N-GRAM基序选择。 Cohesin蛋白序列与Dockerin模块相互作用以构建最常见的有机聚合物纤维素组,用于类特定的基序选择,然后将所选的基序作为特征作为J48和SVM算法。检查分类结果,用各种n克尺寸的参数检查,还原氨基酸字母和特征数。结果培训准确度为98.61%,测试精度为94.54%,发现最佳使用GBMR14字母,每个家庭的5个功能,4克图案和J48算法。所提出的技术可以广泛地用于其他蛋白质家族。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号