首页> 美国卫生研究院文献>other >A Comparative Analysis between k-mers and Community Detection-based Features for the Task of Protein Classification

【2h】

A Comparative Analysis between k-mers and Community Detection-based Features for the Task of Protein Classification

机译：k-mers和基于社区检测的蛋白质分类任务特征的比较分析

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Machine learning algorithms are widely used to annotate biological sequences. Low-dimensional informative feature vectors can be crucial for the performance of the algorithms. In prior work, we have proposed the use of a community detection approach to construct low dimensional feature sets for nucleotide sequence classification. Our approach used the Hamming distance between short nucleotide subsequences, called k-mers, to construct a network, and subsequently used community detection to identify groups of k-mers that appear frequently in a set of sequences. Whereas this approach worked well for nucleotide sequence classification, it could not be directly used for protein sequences, as the Hamming distance is not a good measure for comparing short protein k-mers. To address this limitation, we extended our prior approach by replacing the Hamming distance with substitution scores. Experimental results in different learning scenarios show that the features generated with the new approach are more informative than k-mers.

机译：机器学习算法被广泛用于注释生物序列。低维信息特征向量对于算法的性能至关重要。在先前的工作中，我们建议使用社区检测方法来构建用于核苷酸序列分类的低维特征集。我们的方法使用短核苷酸子序列之间的汉明距离（称为k-mers）来构建网络，然后使用社区检测来识别在一组序列中频繁出现的k-mer组。尽管此方法在核苷酸序列分类中效果很好，但不能将其直接用于蛋白质序列，因为汉明距离并不是比较短蛋白质k-mers的好方法。为了解决这一局限性，我们扩展了以前的方法，将汉明距离替换为替换得分。不同学习场景下的实验结果表明，新方法生成的特征比k-mers更具信息性。

著录项

期刊名称 other
作者
Karthik Tangirala; Nic Herndon; Doina Caragea;
展开▼
作者单位

展开▼
年(卷),期 -1(15),2
年度 -1
页码 84–92
总页数 20
原文格式 PDF
正文语种
中图分类
关键词
Community detection feature construction feature selection dimensionality reduction protein classification supervised learning semi-supervised learning domain adaptation;

机译：社区检测;特征构建;特征选择;降维;蛋白质分类;监督学习;半监督学习;领域适应;

相似文献

外文文献
中文文献
专利

1. A Comparative Analysis Between -Mers and Community Detection-Based Features for the Task of Protein Classification [J] . Karthik Tangirala, Nic Herndon, Doina Caragea IEEE transactions on nanobioscience . 2016,第2期

机译：基于Mers和基于社区检测的蛋白质分类任务的比较分析
2. Comprehensive comparative analysis and identification of RNA-binding protein domains: Multi-class classification and feature selection [J] . JahandidehS., SrinivasasainagendraV., ZhiD. Journal of Theoretical Biology . 2012,第Null期

机译：RNA结合蛋白结构域的全面比较分析和鉴定：多类分类和特征选择
3. Feature Analysis of Unsupervised Learning for Multi-task Classification Using Convolutional Neural Network [J] . Jonghong Kim, Waqas Bukhari, Minho Lee Neural processing letters . 2018,第3期

机译：基于卷积神经网络的多任务分类无监督学习特征分析
4. Community Detection-Based Feature Construction for Protein Sequence Classification [C] . Karthik Tangirala, Nic Herndon, Doina Caragea International symposium on bioinformatics research and applications . 2015

机译：基于社区检测的蛋白质序列分类特征构建
5. Comparative Analysis of Feature Selection and Classification Methods for Epigenetic Methylation Data [D] . Kleyn, Aaron. 2021

机译：表观甲基化数据特征选择和分类方法的比较分析
6. Comprehensive comparative analysis and identification of RNA-binding protein domains: multi-class classification and feature selection [O] . Samad Jahandideh, Vinodh Srinivasasainagendra, Degui Zhi -1

机译：RNA结合蛋白域的综合比较分析与鉴定：多级分类和特征选择
7. An analysis of k-mer frequency features with SVM and CNN for viral subtyping classification [O] . Vicente Enrique Machaca Arceda 2020

机译：具有SVM和CNN的K-MER频率特征分析，用于病毒亚型分类

A Comparative Analysis between k-mers and Community Detection-based Features for the Task of Protein Classification

摘要

著录项

相似文献

相关主题

期刊订阅