Protein Sequence Classification Using Feature Hashing

机译：使用特征散列蛋白质序列分类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recent advances in next-generation sequencing technologies have resulted in an exponential increase in protein sequence data. The k-gram representation, used for protein sequence classification, usually results in prohibitively high dimensional input spaces, for large values of k. Applying data mining algorithms to these input spaces may be intractable due to the large number of dimensions. Hence, using dimensionality reduction techniques can be crucial for the performance and the complexity of the learning algorithms. We study the applicability of feature hashing to protein sequence classification, where the original high-dimensional space is reduced by mapping features to hash keys, such that multiple features can be mapped (at random) to the same key, and aggregating their counts. We compare feature hashing with the bag of k-grams and feature selection approaches. Our results show that feature hashing is an effective approach to reducing dimensionality on protein sequence classification tasks.

机译：下一代测序技术的最新进展导致蛋白质序列数据的指数增加。用于蛋白质序列分类的K-GRAM表示通常导致千维克的非常高的尺寸输入空间。将数据挖掘算法应用于这些输入空间可能由于大量尺寸而具有难以相容的。因此，使用维度降低技术对于学习算法的性能和复杂性来说可能是至关重要的。我们研究特征散列对蛋白质序列分类的适用性，其中通过将特征映射到哈希键来减少原始的高维空间，使得多个特征可以映射到相同的密钥并聚合它们的计数。我们将功能散列与K-GRAM袋和特征选择方法进行比较。我们的结果表明，特征散列是减少蛋白质序列分类任务的维度的有效方法。

著录项

来源
《IEEE International Conference on Bioinformatics Biomedicine》|2011年||共6页
会议地点
作者
Caragea Cornelia; Silvescu Adrian; Mitra Prasenjit;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 Q18-53;
关键词
dimensionality reduction; feature hashing; variable length k-grams;

机译：减少维度;特征散列;可变长度K-grams;

相似文献

外文文献
中文文献
专利

1. Protein sequence classification using feature hashing [J] . Cornelia Caragea, Adrian Silvescu, Prasenjit Mitra Proteome science . 2012,第S1期

机译：使用特征哈希的蛋白质序列分类
2. Hyperspectral Image Classification Method Based on CNN Architecture Embedding With Hashing Semantic Feature [J] . Yu Chunyan, Zhao Meng, Song Meiping, Selected Topics in Applied Earth Observations and Remote Sensing, IEEE Journal of . 2019,第6期

机译：基于CNN架构并嵌入哈希特征的高光谱图像分类方法
3. Efficient Multiple Feature Fusion With Hashing for Hyperspectral Imagery Classification: A Comparative Study [J] . Zisha Zhong, Bin Fan, Kun Ding, IEEE Transactions on Geoscience and Remote Sensing . 2016,第8期

机译：带有散列的高效多特征融合用于高光谱图像分类的比较研究
4. Protein Sequence Classification Using Feature Hashing [C] . Caragea Cornelia, Silvescu Adrian, Mitra Prasenjit 2011 IEEE International Conference on Bioinformatics and Biomedicine . 2011

机译：使用特征哈希的蛋白质序列分类
5. Functional classification of divergent protein sequences and molecular evolution of multi-domain proteins. [D] . Strope, Pooja K. 2011

机译：差异蛋白序列的功能分类和多域蛋白的分子进化。
6. Protein sequence classification using feature hashing [O] . Cornelia Caragea, Adrian Silvescu, Prasenjit Mitra 2012

机译：使用特征哈希的蛋白质序列分类
7. Protein Sequence Classification Using Feature Hashing [O] . Cornelia Caragea, Adrian Silvescu, Prasenjit Mitra 2013

机译：使用特征哈希的蛋白质序列分类
8. Phonetic Set Hashing: A Novel Scheme for Transforming Phone Sequences to Words [R] . Sarukkai, R. R., Ballard, D. H. 1994

机译：语音集哈希：一种将电话序列转换为单词的新方案

Protein Sequence Classification Using Feature Hashing

摘要

著录项

相似文献

相关主题

期刊订阅