...
首页> 外文期刊>BMC Bioinformatics >Short-read reading-frame predictors are not created equal: sequence error causes loss of signal
【24h】

Short-read reading-frame predictors are not created equal: sequence error causes loss of signal

机译:短读阅读框预测变量的创建不相等:序列错误导致信号丢失

获取原文
           

摘要

Background Gene prediction algorithms (or gene callers) are an essential tool for analyzing shotgun nucleic acid sequence data. Gene prediction is a ubiquitous step in sequence analysis pipelines; it reduces the volume of data by identifying the most likely reading frame for a fragment, permitting the out-of-frame translations to be ignored. In this study we evaluate five widely used ab initio gene-calling algorithms—FragGeneScan, MetaGeneAnnotator, MetaGeneMark, Orphelia, and Prodigal—for accuracy on short (75–1000?bp) fragments containing sequence error from previously published artificial data and “real” metagenomic datasets. Results While gene prediction tools have similar accuracies predicting genes on error-free fragments, in the presence of sequencing errors considerable differences between tools become evident. For error-containing short reads, FragGeneScan finds more prokaryotic coding regions than does MetaGeneAnnotator, MetaGeneMark, Orphelia, or Prodigal. This improved detection of genes in error-containing fragments, however, comes at the cost of much lower (50%) specificity and overprediction of genes in noncoding regions. Conclusions Ab initio gene callers offer a significant reduction in the computational burden of annotating individual nucleic acid reads and are used in many metagenomic annotation systems. For predicting reading frames on raw reads, we find the hidden Markov model approach in FragGeneScan is more sensitive than other gene prediction tools, while Prodigal, MGA, and MGM are better suited for higher-quality sequences such as assembled contigs.
机译:背景基因预测算法(或基因调用者)是分析shot弹枪核酸序列数据的重要工具。基因预测是序列分析流程中无处不在的步骤。它通过识别片段最可能的阅读框来减少数据量,从而允许忽略框外翻译。在这项研究中,我们评估了五种广泛使用的从头开始的基因调用算法(FragGeneScan,MetaGeneAnnotator,MetaGeneMark,Orphelia和Prodigal),以确保包含先前公布的人工数据和“真实”序列错误的短片段(75–1000?bp)的准确性宏基因组数据集。结果尽管基因预测工具在无错误片段上预测基因的准确性相似,但是在存在测序错误的情况下,工具之间的巨大差异变得显而易见。对于包含错误的短读,FragGeneScan可找到比MetaGeneAnnotator,MetaGeneMark,Orphelia或Prodigal更多的原核编码区。但是,这种改进的对包含错误的片段中的基因进行检测的方法是付出了更低的特异性(50%)和对非编码区中的基因的过度预测的代价。结论从头算基因调用者可以大大减少注释单个核酸读数的计算负担,并已在许多宏基因组注释系统中使用。对于预测原始读取的阅读框,我们发现FragGeneScan中的隐马尔可夫模型方法比其他基因预测工具更为灵敏,而Prodigal,MGA和MGM更适合于更高质量的序列,例如组装的重叠群。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号