首页> 美国卫生研究院文献>Evolutionary Bioinformatics Online >Classification of Metagenomics Data at Lower Taxonomic Level Using a Robust Supervised Classifier
【2h】

Classification of Metagenomics Data at Lower Taxonomic Level Using a Robust Supervised Classifier

机译:使用鲁棒的监督分类器在较低分类学水平上对元基因组学数据进行分类

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

As more and more completely sequenced genomes become available, the taxonomic classification of metagenomic data will benefit greatly from supervised classifiers that can be updated instantaneously in response to new genomes. Currently, some supervised classifiers have been developed to assess the organism of metagenomic sequences. We have found that the existing supervised classifiers usually cannot discriminate the training data from different classes accurately when the data contain some outliers. However, the training genomic data (bacterial and archaeal genomes) usually contain a portion of outliers, which come from sequencing errors, phage invasions, and some highly expressed genes, etc. The outliers, treated as noises, prohibit the development of classifiers with better prediction accuracy. To solve the problem, we present a robust supervised classifier, weighted support vector domain description (WSVDD), which can eliminate the interference from some outliers for training genomic data and then generate more accurate data domain descriptions for each taxonomic class. The experimental results demonstrate WSVDD is more robust than other classifiers for simulated Sanger and 454 reads with different outlier rates. In addition, in experiments performed on simulated metagenomes and real gut metagenomes, WSVDD also achieved better prediction accuracy than other classifiers.
机译:随着越来越完整的测序基因组的出现,宏基因组学数据的分类学分类将大大受益于监督分类器,这些分类器可以响应新的基因组而即时更新。当前,已经开发了一些监督分类器以评估宏基因组序列的生物。我们发现,当数据包含一些离群值时,现有的监督分类器通常无法准确地将训练数据与不同的类别区分开。但是,训练基因组数据(细菌和古细菌基因组)通常包含一部分异常值,这些异常值来自测序错误,噬菌体入侵和一些高度表达的基因等。这些异常值被视为噪音,禁止开发出更好的分类器。预测准确性。为了解决该问题,我们提出了一个鲁棒的监督分类器,即加权支持向量域描述(WSVDD),它可以消除来自某些离群值的干扰,以训练基因组数据,然后为每个分类类生成更准确的数据域描述。实验结果表明,对于不同的异常率,在模拟Sanger和454读取中,WSVDD比其他分类器更强大。此外,在模拟的基因组和真实的肠道基因组上进行的实验中,WSVDD还比其他分类器获得了更好的预测准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号