首页> 外文期刊>Briefings in bioinformatics >Systematic analysis and prediction of type IV secreted effector proteins by machine learning approaches
【24h】

Systematic analysis and prediction of type IV secreted effector proteins by machine learning approaches

机译:机器学习方法的系统分析和预测IV型分泌效应蛋白

获取原文
获取原文并翻译 | 示例
           

摘要

In the course of infecting their hosts, pathogenic bacteria secrete numerous effectors, namely, bacterial proteins that pervert host cell biology. Many Gram-negative bacteria, including context-dependent human pathogens, use a type IV secretion system (T4SS) to translocate effectors directly into the cytosol of host cells. Various type IV secreted effectors (T4SEs) have been experimentally validated to play crucial roles in virulence by manipulating host cell gene expression and other processes. Consequently, the identification of novel effector proteins is an important step in increasing our understanding of host-pathogen interactions and bacterial pathogenesis. Here, we train and compare six machine learning models, namely, Na?ve Bayes (NB), K-nearest neighbor (KNN), logistic regression (LR), random forest (RF), support vector machines (SVMs) and multilayer perceptron (MLP), for the identification of T4SEs using 10 types of selected features and 5-fold cross-validation. Our study shows that: (1) including different but complementary features generally enhance the predictive performance of T4SEs; (2) ensemble models, obtained by integrating individual single-feature models, exhibit a significantly improved predictive performance and (3) the 'majority voting strategy' led to a more stable and accurate classification performance when applied to predicting an ensemble learning model with distinct single features. We further developed a new method to effectively predict T4SEs, Bastion4 (Bacterial secretion effector predictor for T4SS), and we show our ensemble classifier clearly outperforms two recent prediction tools. In summary, we developed a state-of-the-art T4SE predictor by conducting a comprehensive performance evaluation of different machine learning algorithms along with a detailed analysis of single- and multi-feature selections.
机译:在感染宿主的过程中,病原细菌分泌许多效果,即逆变宿主细胞生物学的细菌蛋白质。许多革兰氏阴性细菌,包括依赖于上下文的人类病原体,使用IV型分泌系统(T4Ss)直接将效应转移到宿主细胞的细胞溶胶中。通过操纵宿主细胞基因表达和其他方法,已经通过实验验证了各种型IV型分泌效应器(T4SES)以在毒力中起着至关重要的作用。因此,新效应蛋白的鉴定是增加我们对宿主病原体相互作用和细菌发病机制的理解的重要一步。在这里,我们训练并比较六种机器学习模型,即Na?ve贝叶斯(NB),K-最近邻居(knn),Logistic回归(LR),随机森林(RF),支持向量机(SVM)和Multilayer Perceptron (MLP),用于使用10种所选特征和5倍交叉验证的T4SES识别。我们的研究表明:(1)包括不同但互补特征普遍增强T4SES的预测性能; (2)通过整合各个单一特征模型获得的集合模型,表现出显着改善的预测性能和(3)当应用于预测具有独特的集合学习模型时,“多数票策略”导致了更稳定和准确的分类性能。单一特征。我们进一步开发了一种有效预测T4SES,Bastion4(T4SS的细菌分泌效应器预测器)的新方法,并且我们显示了我们的集合分类器显然优于两个最近的预测工具。总之,我们通过对不同机器学习算法进行全面的性能评估以及对单一和多特征选择的详细分析来开发了最先进的T4SE预测因子。

著录项

  • 来源
    《Briefings in bioinformatics》 |2019年第3期|共21页
  • 作者单位

    the Biomedicine Discovery Institute and the Department of Microbiology at Monash University Australia.;

    the National Engineering Research Center for Equipment and Technology of Cold Strip Rolling College of Mechanical Engineering from Yanshan University China.;

    the College of Information Engineering Northwest A&

    F University China.;

    the Department of Genetics University of Alabama at Birmingham (UAB) School of Medicine USA.;

    the Department of Genetics and the Informatics Institute University of Alabama at Birmingham (UAB) School of Medicine USA.;

    The University of Melbourne Australia.;

    Central South University China and his master degree in computer science from the University of Melbourne Australia.;

    the College of Information Engineering Northwest A&

    F University China.;

    Kyoto University Japan.;

    University of Tokyo Japan.;

    the Faculty of Information Technology and director of the Monash Centre for Data Science at Monash University.;

    Monash University and Postdoctoral research at University of Birmingham and the Wellcome Research Laboratories in the UK and at Monash University in Australia.;

    the Biomedicine Discovery Institute and the Department of Biochemistry and Molecular Biology Monash University Australia.;

    La Trobe University Australia.;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 遗传学;
  • 关键词

    type IV secreted effector; bioinformatics; sequence analysis; comprehensive performance evaluation; machine learning; feature analysis;

    机译:IV型分泌效应器;生物信息学;序列分析;综合性能评估;机器学习;特征分析;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号