首页> 外文期刊>Bioinformatics >A greedy, graph-based algorithm for the alignment of multiple homologous gene lists.
【24h】

A greedy, graph-based algorithm for the alignment of multiple homologous gene lists.

机译:一种基于图的贪婪算法,用于对齐多个同源基因列表。

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: Many comparative genomics studies rely on the correct identification of homologous genomic regions using accurate alignment tools. In such case, the alphabet of the input sequences consists of complete genes, rather than nucleotides or amino acids. As optimal multiple sequence alignment is computationally impractical, a progressive alignment strategy is often employed. However, such an approach is susceptible to the propagation of alignment errors in early pairwise alignment steps, especially when dealing with strongly diverged genomic regions. In this article, we present a novel accurate and efficient greedy, graph-based algorithm for the alignment of multiple homologous genomic segments, represented as ordered gene lists. Results: Based on provable properties of the graph structure, several heuristics are developed to resolve local alignment conflicts that occur due to gene duplication and/or rearrangement events on the different genomic segments. The performance of the algorithm is assessed by comparing the alignment results of homologous genomic segments in Arabidopsis thaliana to those obtained by using both a progressive alignment method and an earlier graph-based implementation. Especially for datasets that contain strongly diverged segments, the proposed method achieves a substantially higher alignment accuracy, and proves to be sufficiently fast for large datasets including a few dozens of eukaryotic genomes.Digital Object Identifier http://dx.doi.org/10.1093/bioinformatics/btr008
机译:动机:许多比较基因组学研究都依靠使用精确的比对工具正确鉴定同源基因组区域。在这种情况下,输入序列的字母由完整的基因组成,而不是核苷酸或氨基酸。由于最佳的多序列比对在计算上不切实际,因此经常采用渐进式比对策略。但是,这种方法容易在早期的成对比对步骤中传播比对误差,特别是在处理差异很大的基因组区域时。在本文中,我们提出了一种新颖,准确,高效的基于图的贪婪算法,用于对齐多个同源基因组片段(表示为有序基因列表)。结果:基于图结构的可证明性质,开发了几种启发式方法来解决由于不同基因组区段上的基因重复和/或重排事件而引起的局部比对冲突。通过将拟南芥(Arabidopsis thaliana)中同源基因组片段的比对结果与使用渐进比对方法和较早基于图的实现方式获得的比对结果进行比较,可以评估算法的性能。特别是对于包含强烈分歧片段的数据集,该方法可实现更高的比对准确性,并被证明对于包括数十个真核基因组的大型数据集足够快。Digital Object Identifier http://dx.doi.org/10.1093 / bioinformatics / btr008

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号