...
首页> 外文期刊>BMC Genomics >MARS: improving multiple circular sequence alignment using refined sequences
【24h】

MARS: improving multiple circular sequence alignment using refined sequences

机译:MARS:使用精细序列改善多个循环序列比对

获取原文
           

摘要

Background A fundamental assumption of all widely-used multiple sequence alignment techniques is that the left- and right-most positions of the input sequences are relevant to the alignment. However, the position where a sequence starts or ends can be totally arbitrary due to a number of reasons: arbitrariness in the linearisation (sequencing) of a circular molecular structure; or inconsistencies introduced into sequence databases due to different linearisation standards. These scenarios are relevant, for instance, in the process of multiple sequence alignment of mitochondrial DNA, viroid, viral or other genomes, which have a circular molecular structure. A solution for these inconsistencies would be to identify a suitable rotation (cyclic shift) for each sequence; these refined sequences may in turn lead to improved multiple sequence alignments using the preferred multiple sequence alignment program. Results We present MARS , a new heuristic method for improving Multiple circular sequence Alignment using Refined Sequences. MARS was implemented in the C++ programming language as a program to compute the rotations (cyclic shifts) required to best align a set of input sequences. Experimental results, using real and synthetic data, show that MARS improves the alignments, with respect to standard genetic measures and the inferred maximum-likelihood-based phylogenies, and outperforms state-of-the-art methods both in terms of accuracy and efficiency. Our results show, among others, that the average pairwise distance in the multiple sequence alignment of a dataset of widely-studied mitochondrial DNA sequences is reduced by around 5% when MARS is applied before a multiple sequence alignment is performed. Conclusions Analysing multiple sequences simultaneously is fundamental in biological research and multiple sequence alignment has been found to be a popular method for this task. Conventional alignment techniques cannot be used effectively when the position where sequences start is arbitrary. We present here a method, which can be used in conjunction with any multiple sequence alignment program, to address this problem effectively and efficiently.
机译:背景技术所有广泛使用的多序列比对技术的基本假设是输入序列的最左和最右位置与比对有关。然而,由于多种原因,序列开始或结束的位置可以是完全任意的:环状分子结构的线性化(测序)中的任意性;或由于线性化标准不同而引入序列数据库的不一致。这些情况例如在具有圆形分子结构的线粒体DNA,类病毒,病毒或其他基因组的多序列比对过程中是相关的。解决这些矛盾的方法是为每个序列确定合适的旋转(循环移位)。使用优选的多序列比对程序,这些精炼的序列又可以导致改进的多序列比对。结果我们介绍了MARS,这是一种使用改进序列改进多环序列比对的新启发式方法。 MARS是用C ++编程语言实现的,是一种用于计算最佳对齐一组输入序列所需的旋转(循环移位)的程序。使用实际数据和合成数据进行的实验结果表明,MARS改进了标准遗传测量和推断的基于最大似然性的系统发育方面的比对,在准确性和效率方面均优于最新方法。我们的结果显示,除其他外,在进行多序列比对之前应用MARS时,广泛研究的线粒体DNA序列的数据集的多序列比对中的平均成对距离降低了5%左右。结论同时分析多个序列是生物学研究的基础,发现多序列比对是完成此任务的一种流行方法。当序列开始的位置是任意的时,常规比对技术将无法有效使用。我们在这里提出一种方法,可以与任何多序列比对程序结合使用,以有效地解决这一问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号