首页> 外文期刊>Chemometrics and Intelligent Laboratory Systems >A deep learning framework for sequence-based bacteria type IV secreted effectors prediction
【24h】

A deep learning framework for sequence-based bacteria type IV secreted effectors prediction

机译:基于序列的细菌IV型分泌效果的深度学习框架预测

获取原文
获取原文并翻译 | 示例
           

摘要

Type IV secretion system (T4SS) is a specialized protein delivery system in gram-negative bacteria that injects proteins (called effectors, T4SEs) directly into the eukaryotic host cytosol and facilitates bacterial infection. Since various T4SEs have been experimentally validated to play important roles in a wide variety of biological activities, identifying them is crucial to our understanding of host-pathogen interactions and bacterial pathogenesis. However, experimental identification is often time-consuming and expensive. In the post-genomic era, it becomes imperative to predict new T4SEs using information from the amino acid sequence alone when new proteins are being identified in a high-throughput mode. Consequently, in this work we propose, DeepT4, a novel deep learning method to directly classify any protein sequence into T4SEs or non-T4SEs only using the protein primary sequences. The backbone of our framework is a convolutional neural network (CNN), which automatically extracts T4SEs-related features from 50 N-terminal and 100 C-terminal residues of the protein. We train and test the deep CNN model on a comprehensive dataset across multiple bacterial species, with a high receiver operating curve of 0.876 in the 5-fold cross validation and an accuracy of 92.2% for the test set. Moreover, when performing on a common independent dataset, DeepT4 outperforms known sequence-based state-of-the-art T4SEs prediction methods. We believe that deep learning is a valuable method to predict type IV secreted effectors. This study will be useful in elucidating the secretion mechanism of T4SS and facilitating hypothesis-driven experimental design and validation.
机译:IV型分泌系统(T4SS)是一种革兰阴性细菌的专用蛋白质递送系统,可将蛋白质(称为效果,T4SES)直接注射到真核宿主细胞溶胶中并促进细菌感染。由于各种T4SES在经过实验验证以在各种各样的生物活动中发挥重要作用,因此识别它们对我们对宿主病原体相互作用和细菌发病机制的理解至关重要。然而,实验识别通常是耗时和昂贵的。在后基因组时代,当在高通量模式中识别新的蛋白质时,必须使用来自氨基酸序列的信息,因此必须使用来自氨基酸序列的信息。因此,在这项工作中,我们提出DEEPT4,一种新的深度学习方法,即仅使用蛋白质初级序列将任何蛋白质序列直接分类为T4SE或非T4SES。我们框架的骨干是卷积神经网络(CNN),其自动从50 n末端和100个蛋白质的100个C末端残基提取T4SES相关的特征。我们在多种细菌种类上培训并测试综合数据集的深层CNN模型,高接收器运行曲线为0.876,在5倍的交叉验证中为92.2%的准确度。此外,在在公共独立数据集上执行时,DEEPT4优于已知的基于序列的最先进的T4SES预测方法。我们认为深入学习是预测IV型分泌效果的有价值的方法。本研究将有助于阐明T4SS的分泌机制,促进假设驱动的实验设计和验证。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号