首页> 外文期刊>Biomedical signal processing and control >Taxonomic classification of metagenomic sequences from Relative Abundance Index profiles using deep learning
【24h】

Taxonomic classification of metagenomic sequences from Relative Abundance Index profiles using deep learning

机译:深度学习中相对丰富指数谱的分类分类分类

获取原文
获取原文并翻译 | 示例

摘要

We propose a Convolutional Neural Network approach based on k-mer representation for metagenomic fragment classification problem. The proposed model consists of two steps; the first step is representation of DNA based on k-mer frequency with Relative Abundance Index (RAI) and the second step is classification metagenomic fragments with CNN. RAI scores, as DNA fragment representations are fed to CNN classifiers (CNN-RAI). RAI consist of the over- and under abundance statistics gathered from the taxon for each k-mer. In order to compare the performances of CNN-RAI and RAIphy, which classifies metagenomic fragments using the same input attributes with an expectation-maximization based approach, databases of different metagenomic scenarios were tested. Metagenomics data that were generated (or simulated) by different Next-Generation Sequencing platforms, respectively Illumina technology and Oxford Nanopore MinION were compiled into shotgun metagenomics or 16S rRNA datasets. RAI based method and CNN models were trained on represented data with read lengths ranging between 200 and 10,000 bp, also with distinct k-mer size (3 = k = 7) at genus level. RAI score was used for the first time in the deep learning algorithm as a spectral representation with improved performance thanks to the ability of deep learning on each dataset for a range of parameters. The proposed representation was compared to the current spectral methods and shown to be competitive for all datasets used in this study.
机译:我们提出了一种基于K-MER表示的卷积神经网络方法,用于均衡片段分类问题。拟议的模型由两个步骤组成;第一步是基于K-MER频率的DNA的DNA表示,其具有相对丰度指数(RAI),第二步骤是具有CNN的分类偏心片段。作为DNA片段表示被送入CNN分类器(CNN-RAI)的RAI评分。 RAI由每次K-MER从分类群中收集的过度和丰富的统计数据组成。为了比较CNN-RAI和Raiphy的性能,它将使用与基于期望的最大化的方法的使用相同的输入属性进行分类的分类群体片段,测试了不同的偏见情景的数据库。通过不同的下一代测序平台产生(或模拟)的偏见组数据,分别是Illumina技术和牛津纳米孔沟编制到霰弹枪偏心神经或16S rRNA数据集中。基于RAI的方法和CNN模型在代表的数据上培训,其中读取长度在200和10,000bp之间的读取长度,也具有在属级别的不同k-mer大小(3& = 7)。 RAI评分在深度学习算法中首次使用,作为一种谱表示,由于各个数据集的深度学习的能力提高了性能。将所提出的表示与当前的光谱方法进行比较,并显示本研究中使用的所有数据集具有竞争力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号