首页> 美国卫生研究院文献>Bioinformatics >Pairagon: a highly accurate HMM-based cDNA-to-genome aligner
【2h】

Pairagon: a highly accurate HMM-based cDNA-to-genome aligner

机译:Pairagon:基于HMM的高精度cDNA至基因组比对仪

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Motivation: The most accurate way to determine the intron–exon structures in a genome is to align spliced cDNA sequences to the genome. Thus, cDNA-to-genome alignment programs are a key component of most annotation pipelines. The scoring system used to choose the best alignment is a primary determinant of alignment accuracy, while heuristics that prevent consideration of certain alignments are a primary determinant of runtime and memory usage. Both accuracy and speed are important considerations in choosing an alignment algorithm, but scoring systems have received much less attention than heuristics.>Results: We present Pairagon, a pair hidden Markov model based cDNA-to-genome alignment program, as the most accurate aligner for sequences with high- and low-identity levels. We conducted a series of experiments testing alignment accuracy with varying sequence identity. We first created ‘perfect’ simulated cDNA sequences by splicing the sequences of exons in the reference genome sequences of fly and human. The complete reference genome sequences were then mutated to various degrees using a realistic mutation simulator and the perfect cDNAs were aligned to them using Pairagon and 12 other aligners. To validate these results with natural sequences, we performed cross-species alignment using orthologous transcripts from human, mouse and rat.We found that aligner accuracy is heavily dependent on sequence identity. For sequences with 100% identity, Pairagon achieved accuracy levels of >99.6%, with one quarter of the errors of any other aligner. Furthermore, for human/mouse alignments, which are only 85% identical, Pairagon achieved 87% accuracy, higher than any other aligner.>Availability: Pairagon source and executables are freely available at >Contact: ; >Supplementary information: are available at Bioinformatics online.
机译:>动机:确定基因组中内含子-外显子结构的最准确方法是将剪接的cDNA序列与基因组进行比对。因此,cDNA至基因组比对程序是大多数注释流程的关键组成部分。用于选择最佳对齐方式的评分系统是对齐准确性的主要决定因素,而避免考虑某些对齐方式的启发式方法是运行时和内存使用情况的主要决定因素。准确性和速度都是选择比对算法的重要考虑因素,但评分系统受到的启发远少于启发式算法。>结果:我们介绍了Pairagon,这是一种基于配对隐马尔可夫模型的cDNA至基因组比对程序。 ,作为具有高和低同一性水平的序列的最准确的比对器。我们进行了一系列实验,以不同的序列同一性测试比对准确性。我们首先通过将外显子的序列拼接到果蝇和人的参考基因组序列中来创建“完美”的模拟cDNA序列。然后使用现实的突变模拟器将完整的参考基因组序列进行不同程度的突变,并使用Pairagon和其他12个比对器将完美的cDNA与它们进行比对。为了用自然序列验证这些结果,我们使用人类,小鼠和大鼠的直系同源转录本进行了跨物种比对,我们发现比对器准确性在很大程度上取决于序列同一性。对于具有100%同一性的序列,Pairagon达到的准确度水平> 99.6%,而其他任何比对仪的误差仅为四分之一。此外,对于只有85%相同性的人/小鼠对齐方式,Pairagon达到了87%的准确性,高于任何其他对齐器。>可用性: Pairagon源代码和可执行文件可从>免费获得。 / strong>; >补充信息:可在线访问生物信息学。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号