首页> 外文期刊>BMC Bioinformatics >DectICO: an alignment-free supervised metagenomic classification method based on feature extraction and dynamic selection
【24h】

DectICO: an alignment-free supervised metagenomic classification method based on feature extraction and dynamic selection

机译:DectICO:基于特征提取和动态选择的无对准监督宏基因组分类方法

获取原文
       

摘要

Background Continual progress in next-generation sequencing allows for generating increasingly large metagenomes which are over time or space. Comparing and classifying the metagenomes with different microbial communities is critical. Alignment-free supervised classification is important for discriminating between the multifarious components of metagenomic samples, because it can be accomplished independently of known microbial genomes. Results We propose an alignment-free supervised metagenomic classification method called DectICO. The intrinsic correlation of oligonucleotides provides the feature set, which is selected dynamically using a kernel partial least squares algorithm, and the feature matrices extracted with this set are sequentially employed to train classifiers by support vector machine (SVM). We evaluated the classification performance of DectICO on three actual metagenomic sequencing datasets, two containing deep sequencing metagenomes and one of low coverage. Validation results show that DectICO is powerful, performs well based on long oligonucleotides (i.e., 6-mer to 8-mer), and is more stable and generalized than a sequence-composition-based method. The classifiers trained by our method are more accurate than non-dynamic feature selection methods and a recently published recursive-SVM-based classification approach. Conclusions The alignment-free supervised classification method DectICO can accurately classify metagenomic samples without dependence on known microbial genomes. Selecting the ICO dynamically offers better stability and generality compared with sequence-composition-based classification algorithms. Our proposed method provides new insights in metagenomic sample classification.
机译:背景技术下一代测序技术的不断发展,使得随着时间或空间的推移,越来越大的元基因组的产生。比较和分类具有不同微生物群落的元基因组至关重要。无比对的监督分类对于区分宏基因组学样本的多种组成部分非常重要,因为它可以独立于已知的微生物基因组来完成。结果我们提出了一种称为DectICO的无序列监督的宏基因组分类方法。寡核苷酸的内在相关性提供了特征集,该特征集使用核偏最小二乘算法动态选择,并且利用该集提取的特征矩阵被支持向量机(SVM)依次用于训练分类器。我们评估了DectICO在三个实际的宏基因组测序数据集上的分类性能,其中两个包含深度测序元基因组,而一个覆盖率较低。验证结果表明,DectICO功能强大,基于长寡核苷酸(即6-mer至8-mer)的性能良好,并且比基于序列组成的方法更稳定,更通用。通过我们的方法训练的分类器比非动态特征选择方法和最近发布的基于递归SVM的分类方法更准确。结论无比对监督分类方法DectICO可以准确分类宏基因组样本,而无需依赖已知的微生物基因组。与基于序列组成的分类算法相比,动态选择ICO具有更好的稳定性和通用性。我们提出的方法为宏基因组样本分类提供了新的见识。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号