...
首页> 外文期刊>BMC Bioinformatics >SnoReport 2.0: new features and a refined Support Vector Machine to improve snoRNA identification
【24h】

SnoReport 2.0: new features and a refined Support Vector Machine to improve snoRNA identification

机译:SnoReport 2.0:新功能和完善的支持向量机,可改善snoRNA识别

获取原文
           

摘要

Background snoReport uses RNA secondary structure prediction combined with machine learning as the basis to identify the two main classes of small nucleolar RNAs, the box H/ACA snoRNAs and the box C/D snoRNAs. Here, we present snoReport 2.0, which substantially improves and extends in the original method by: extracting new features for both box C/D and H/ACA box snoRNAs; developing a more sophisticated technique in the SVM training phase with recent data from vertebrate organisms and a careful choice of the SVM parameters C and γ ; and using updated versions of tools and databases used for the construction of the original version of snoReport . To validate the new version and to demonstrate its improved performance, we tested snoReport 2.0 in different organisms. Results Results of the training and test phases of boxes H/ACA and C/D snoRNAs, in both versions of snoReport, are discussed. Validation on real data was performed to evaluate the predictions of snoReport 2.0. Our program was applied to a set of previously annotated sequences, some of them experimentally confirmed, of humans, nematodes, drosophilids, platypus, chickens and leishmania. We significantly improved the predictions for vertebrates, since the training phase used information of these organisms, but H/ACA box snoRNAs identification was improved for the other ones. Conclusion We presented snoReport 2.0, to predict H/ACA box and C/D box snoRNAs, an efficient method to find true positives and avoid false positives in vertebrate organisms. H/ACA box snoRNA classifier showed an F-score of 93 % (an improvement of 10 % regarding the previous version), while C/D box snoRNA classifier, an F-Score of 94 % (improvement of 14 %). Besides, both classifiers exhibited performance measures above 90 %. These results show that snoReport 2.0 avoid false positives and false negatives, allowing to predict snoRNAs with high quality. In the validation phase, snoReport 2.0 predicted 67.43 % of vertebrate organisms for both classes. For Nematodes and Drosophilids, 69 % and 76.67 %, for H/ACA box snoRNAs were predicted, respectively, showing that snoReport 2.0 is good to identify snoRNAs in vertebrates and also H/ACA box snoRNAs in invertebrates organisms.
机译:背景snoReport使用RNA二级结构预测与机器学习相结合的基础,来识别小核仁RNA的两个主要类别,即盒H / ACA snoRNA和盒C / D snoRNA。在这里,我们介绍snoReport 2.0,它通过以下方法对原始方法进行了实质性的改进和扩展:提取盒C / D和H / ACA盒snoRNA的新功能;利用来自脊椎动物的最新数据以及SVM参数C和γ的谨慎选择,在SVM训练阶段开发更先进的技术;以及使用用于构建snoReport原始版本的工具和数据库的更新版本。为了验证新版本并展示其改进的性能,我们在不同的生物中测试了snoReport 2.0。结果讨论了两种版本snoReport中H / ACA和C / D盒snoRNA的训练和测试阶段的结果。对真实数据进行验证以评估snoReport 2.0的预测。我们的程序适用于人类,线虫,果蝇,鸭嘴兽,鸡和利什曼原虫的一组先前注释的序列,其中一些已通过实验证实。由于训练阶段使用了这些生物的信息,因此我们大大改善了对脊椎动物的预测,但是对于其他生物,H / ACA盒snoRNA的识别得到了改善。结论我们提出了snoReport 2.0,以预测H / ACA框和C / D框snoRNA,这是在脊椎动物中发现真阳性和避免假阳性的有效方法。 H / ACA盒snoRNA分类器显示93%的F分数(比以前的版本提高10%),而C / D盒snoRNA分类器显示94%的F分数(提高14%)。此外,两个分类器均表现出超过90%的性能指标。这些结果表明snoReport 2.0避免了误报和误报,从而可以预测高质量的snoRNA。在验证阶段,snoReport 2.0预测这两种类别的脊椎动物都有67.43%。对于线虫和果蝇,分别预测了H / ACA盒snoRNA的69%和76.67%,这表明snoReport 2.0可以很好地识别脊椎动物中的snoRNA,也可以识别无脊椎动物生物中的H / ACA盒snoRNA。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号