首页> 外文期刊>PLoS One >iterb-PPse: Identification of transcriptional terminators in bacterial by incorporating nucleotide properties into PseKNC
【24h】

iterb-PPse: Identification of transcriptional terminators in bacterial by incorporating nucleotide properties into PseKNC

机译:ITERB-PPSE:通过将核苷酸特性掺入PSEKNC中的细菌中的转录终止子

获取原文
           

摘要

Terminator is a DNA sequence that gives the RNA polymerase the transcriptional termination signal. Identifying terminators correctly can optimize the genome annotation, more importantly, it has considerable application value in disease diagnosis and therapies. However, accurate prediction methods are deficient and in urgent need. Therefore, we proposed a prediction method “iterb-PPse” for terminators by incorporating 47 nucleotide properties into PseKNC-Ⅰ and PseKNC-Ⅱ and utilizing Extreme Gradient Boosting to predict terminators based on Escherichia coli and Bacillus subtilis . Combing with the preceding methods, we employed three new feature extraction methods K-pwm, Base-content, Nucleotidepro to formulate raw samples. The two-step method was applied to select features. When identifying terminators based on optimized features, we compared five single models as well as 16 ensemble models. As a result, the accuracy of our method on benchmark dataset achieved 99.88%, higher than the existing state-of-the-art predictor iTerm-PseKNC in 100 times five-fold cross-validation test. Its prediction accuracy for two independent datasets reached 94.24% and 99.45% respectively. For the convenience of users, we developed a software on the basis of “iterb-PPse” with the same name. The open software and source code of “iterb-PPse” are available at https://github.com/Sarahyouzi/iterb-PPse .
机译:终止子是一种DNA序列,其给出RNA聚合酶转录终止信号。识别终止子可以正确优化基因组注释,更重要的是,它在疾病诊断和治疗中具有相当大的应用价值。然而,准确的预测方法缺乏,迫切需要。因此,我们通过将47个核苷酸性能掺入PseKNC-Ⅰ和PSEKNC-Ⅱ中,提出了一种预测方法“ITERB-PPSE”,并利用极端梯度提升以预测基于大肠杆菌和枯草芽孢杆菌的终止子。梳理前面的方法,我们使用三种新的特征提取方法K-PWM,碱基含量,核苷酸分配原料样品。应用两步方法选择特征。在基于优化功能识别终结器时,我们将五种型号和16个集合模型进行了比较。因此,我们在基准数据集中的方法的准确性实现了99.88%,高于现有的最先进的预测仪ITERM-PSEKNC,在5倍的跨验证测试中的100倍。其对两个独立数据集的预测精度分别达到94.24%和99.45%。为了方便用户,我们基于“iterb-ppse”的软件,具有相同的名称。 “iterb-ppse”的开放软件和源代码可在https://github.com/sarahyouzi/iterb-pps上获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号