首页> 美国卫生研究院文献>PLoS Clinical Trials >An optimal set of features for predicting type IV secretion system effector proteins for a subset of species based on a multi-level feature selection approach
【2h】

An optimal set of features for predicting type IV secretion system effector proteins for a subset of species based on a multi-level feature selection approach

机译:基于多级特征选择方法预测一组子集的IV型分泌系统效应蛋白的一组最佳特征

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Type IV secretion systems (T4SS) are multi-protein complexes in a number of bacterial pathogens that can translocate proteins and DNA to the host. Most T4SSs function in conjugation and translocate DNA; however, approximately 13% function to secrete proteins, delivering effector proteins into the cytosol of eukaryotic host cells. Upon entry, these effectors manipulate the host cell’s machinery for their own benefit, which can result in serious illness or death of the host. For this reason recognition of T4SS effectors has become an important subject. Much previous work has focused on verifying effectors experimentally, a costly endeavor in terms of money, time, and effort. Having good predictions for effectors will help to focus experimental validations and decrease testing costs. In recent years, several scoring and machine learning-based methods have been suggested for the purpose of predicting T4SS effector proteins. These methods have used different sets of features for prediction, and their predictions have been inconsistent. In this paper, an optimal set of features is presented for predicting T4SS effector proteins using a statistical approach. A thorough literature search was performed to find features that have been proposed. Feature values were calculated for datasets of known effectors and non-effectors for T4SS-containing pathogens for four genera with a sufficient number of known effectors, Legionella pneumophila, Coxiella burnetii, Brucella spp, and Bartonella spp. The features were ranked, and less important features were filtered out. Correlations between remaining features were removed, and dimensional reduction was accomplished using principal component analysis and factor analysis. Finally, the optimal features for each pathogen were chosen by building logistic regression models and evaluating each model. The results based on evaluation of our logistic regression models confirm the effectiveness of our four optimal sets of features, and based on these an optimal set of features is proposed for all T4SS effector proteins.
机译:IV型分泌系统(T4SS)是许多细菌病原体中的多蛋白复合物,可以将蛋白质和DNA转移到宿主中。大多数T4SS在结合和转运DNA中起作用。然而,大约有13%的蛋白质具有分泌蛋白的功能,将效应蛋白递送到真核宿主细胞的胞质溶胶中。这些效应子一进入,就为自己的利益操纵宿主细胞的机械,这可能导致宿主严重疾病或死亡。因此,T4SS效应子的识别已成为重要的课题。以前的许多工作都集中在通过实验验证效应器上,这是一项耗费金钱,时间和精力的昂贵工作。对效应子有良好的预测将有助于集中实验验证并降低测试成本。近年来,已提出了几种基于评分和基于机器学习的方法来预测T4SS效应蛋白。这些方法使用了不同的特征集进行预测,并且它们的预测不一致。在本文中,提出了一组最佳功能,用于使用统计方法预测T4SS效应蛋白。进行了全面的文献搜索,以找到已提出的功能。计算具有四个足够数量的已知效应子,嗜肺军团菌,伯氏杆菌,布鲁氏菌和Bartonella菌的四个属的含T4SS病原体的已知效应子和非效应子的数据集的特征值。对功能进行排名,对次要功能进行过滤。删除其余特征之间的相关性,并使用主成分分析和因子分析完成尺寸缩减。最后,通过建立逻辑回归模型并评估每种模型来选择每种病原体的最佳特征。基于我们的逻辑回归模型的评估结果证实了我们四个最佳特征集的有效性,并基于这些结果,为所有T4SS效应蛋白提出了最佳特征集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号