首页> 外文期刊>PLoS Computational Biology >Learning, visualizing and exploring 16S rRNA structure using an attention-based deep neural network
【24h】

Learning, visualizing and exploring 16S rRNA structure using an attention-based deep neural network

机译:使用注意的深神经网络学习,可视化和探索16S RRNA结构

获取原文
           

摘要

Recurrent neural networks with memory and attention mechanisms are widely used in natu ral language processing because they can capture short and long term sequential information for diverse tasks. We propose an integrated deep learning model for microbial DNAsequence data, which exploits convolutional neural networks, recurrent neural networks,and attention mechanisms to predict taxonomic classifications and sample-associated attributes, such as the relationship between the microbiome and host phenotype, on the read/sequence level. In this paper, we develop this novel deep learning approach and evaluateits application to amplicon sequences. We apply our approach to short DNA reads and fullsequences of 16S ribosomal RNA (rRNA) marker genes, which identify the heterogeneity ofa microbial community sample. We demonstrate that our implementation of a novel attention-based deep network architecture, Read2Pheno, achieves read-level phenotypicprediction. Training Read2Pheno models will encode sequences (reads) into dense, meaningful representations: learned embedded vectors output from the intermediate layer of thenetwork model, which can provide biological insight when visualized. The attention layer ofRead2Pheno models can also automatically identify nucleotide regions in reads/sequenceswhich are particularly informative for classification. As such, this novel approach can avoidpre/post-processing and manual interpretation required with conventional approaches tomicrobiome sequence classification. We further show, as proof-of-concept, that aggregatingread-level information can robustly predict microbial community properties, host phenotype,and taxonomic classification, with performance at least comparable to conventionalapproaches. An implementation of the attention-based deep learning network is available athttps://github.com/EESI/sequence_attention (a python package) and https://github.com/EESI/seq2att (a command line tool).
机译:具有内存和注意机制的经常性神经网络广泛用于Natu RAL语言处理,因为它们可以捕获用于各种任务的短期和长期顺序信息。我们提出了一种集成的微生物DNASTQUENCE数据深入学习模型,其利用卷积神经网络,经常性神经网络和注意机制来预测分类学分类和样本相关的属性,例如微生物组和宿主表型之间的关系,读/序列级别。在本文中,我们开发了这种新颖的深度学习方法,并评估了扩增子序列的应用。我们将我们的方法应用于16S核糖体RNA(RRNA)标记基因的短DNA读取和全序列,其鉴定了微生物群落样品的异质性。我们证明,我们的实施基于新的基于关注的深网络架构,Read2Pheno,实现了读级别的表型预测。培训READ2PHENO模型将编码序列(读取)成密集,有意义的表示:学习从那个从那说中的中间层输出的嵌入式矢量,它可以在可视化时提供生物洞察力。 Read2Pheno模型的注意层还可以自动识别读/序列中的核苷酸区域,以尤其提供对分类。因此,这种新颖的方法可以避免/后处理和手动解释所需的传统方法Tomicrobiome序列分类。我们进一步展示了概念验证,该聚合读级别信息可以鲁布布地预测微生物群落性质,宿主表型和分类学分类,其性能至少与常规人数相当。有基于关注的深度学习网络的实现是可用的Athttps://github.com/eesi/sequence_ttentent(一个Python包)和https://github.com/eepsi/seq2att(命令行工具)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号