On Computing Breakpoint Distances for Genomes with Duplicate Genes

Shao Mingfu; Moret Bernard M. E.

首页> 外文期刊>Journal of computational biology: A journal of computational molecular cell biology >On Computing Breakpoint Distances for Genomes with Duplicate Genes

【24h】

On Computing Breakpoint Distances for Genomes with Duplicate Genes

机译：在计算具有重复基因的基因组的断点距离

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

A fundamental problem in comparative genomics is to compute the distance between two genomes in terms of its higher level organization (given by genes or syntenic blocks). For two genomes without duplicate genes, we can easily define (and almost always efficiently compute) a variety of distance measures, but the problem is NP-hard under most models when genomes contain duplicate genes. To tackle duplicate genes, three formulations (exemplar, maximum matching, and any matching) have been proposed, all of which aim to build a matching between homologous genes so as to minimize some distance measure. Of the many distance measures, the breakpoint distance (the number of nonconserved adjacencies) was the first one to be studied and remains of significant interest because of its simplicity and model-free property. The three breakpoint distance problems corresponding to the three formulations have been widely studied. Although we provided last year a solution for the exemplar problem that runs very fast on full genomes, computing optimal solutions for the other two problems has remained challenging. In this article, we describe very fast, exact algorithms for these two problems. Our algorithms rely on a compact integer-linear program that we further simplify by developing an algorithm to remove variables, based on new results on the structure of adjacencies and matchings. Through extensive experiments using both simulations and biological data sets, we show that our algorithms run very fast (in seconds) on mammalian genomes and scale well beyond. We also apply these algorithms (as well as the classic orthology tool MSOAR) to create orthology assignment, then compare their quality in terms of both accuracy and coverage. We find that our algorithm for the any matching formulation significantly outperforms other methods in terms of accuracy while achieving nearly maximum coverage.

机译：比较基因组学中的一个基本问题是根据其更高水平的组织（由基因或同期嵌段给出）计算两个基因组之间的距离。对于没有重复基因的两个基因组，我们可以容易地定义（并且几乎总是有效地计算）各种距离测量，但是当基因组含有重复基因时，问题在大多数模型下的问题是NP - 硬。为了解决重复基因，已经提出了三种制剂（示例性，最大匹配和任何匹配），所有这些都旨在在同源基因之间构建匹配，以便最小化一些距离测量。在许多距离措施中，断点距离（非经常性邻接的数量）是第一个要研究的距离，并且由于其简单性和无模型物业，因此是第一个和重大兴趣的遗骸。已经广泛研究了与三种配方对应的三个断点距离问题。虽然我们在去年提供了一个解决方案的解决方案，但在全基因组上运行非常快，虽然在全基因组上运行非常快，但计算其他两个问题的最佳解决方案仍然具有挑战性。在本文中，我们描述了这两个问题的非常快速，精确的算法。我们的算法依赖于一个紧凑的整数线程，我们通过开发算法来删除变量的算法，基于新的结果对邻接和匹配的结构进行了新的结果。通过使用两种模拟和生物数据集进行广泛的实验，我们表明我们的算法在哺乳动物基因组上运行非常快（以秒为单位），并远远缩放。我们还将这些算法（以及经典的正轨工具MsoAL）应用于创建原子大学分配，然后在精度和覆盖范围内比较它们的质量。我们发现，我们的任何匹配配方的算法在准确性方面显着优于其他方法，同时实现了几乎最大的覆盖率。

著录项

来源
《Journal of computational biology: A journal of computational molecular cell biology 》 |2017年第6期| 共10页
作者
Shao Mingfu; Moret Bernard M. E.;
展开▼
作者单位

Ecole Polytech Fed Lausanne Sch Comp &

Commun Sci Lab Computat Biol &

Bioinformat Lausanne Switzerland;

Ecole Polytech Fed Lausanne Sch Comp &

Commun Sci Lab Computat Biol &

Bioinformat Lausanne Switzerland;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类生物数学方法 ;
关键词
breakpoint distance; exemplar; gene family; ILP; intermediate; maximum matching; orthology assignment;

机译：断点距离;示例;基因家族;ILP;中间;最大匹配;正式作业;

相似文献

外文文献
中文文献
专利

1. On Computing Breakpoint Distances for Genomes with Duplicate Genes [J] . Shao Mingfu, Moret Bernard M. E. Journal of computational biology: A journal of computational molecular cell biology . 2017 ,第6期

机译：在计算具有重复基因的基因组的断点距离
2. An Exact Algorithm to Compute the Double-Cut-and-Join Distance for Genomes with Duplicate Genes [J] . Shao Mingfu, Lin Yu, Moret Bernard M. E. Journal of computational biology: A journal of computational molecular cell biology . 2015 ,第5期

机译：计算重复基因组双切入距离的精确算法
3. An Exact Algorithm to Compute the Double-Cut-and-Join Distance for Genomes with Duplicate Genes [J] . LinYu, MoretBernard M.E., ShaoMingfu Journal of computational biology . 2015 ,第5期

机译：计算具有重复基因的基因组的双切入连接距离的精确算法
4. On Computing Breakpoint Distances for Genomes with Duplicate Genes [C] . Mingfu Shao, Bernard M.E. Moret Annual international conference on research in computational molecular biology . 2016

机译：关于具有重复基因的基因组的断点距离计算
5. Gene content evolution in plant genomes: Studies of whole genome duplication, intergenic transcription and expression evolution in brassicaceae and poaceae species. [D] . Moghe, Gaurav Dilip. 2013

机译：植物基因组中的基因含量演变：研究芸苔科和禾本科物种中的全基因组复制，基因间转录和表达演变。
6. Approximating the edit distance for genomes with duplicate genes under DCJ insertion and deletion [O] . Mingfu Shao, Yu Lin 2012

机译：DCJ插入和缺失下具有重复基因的基因组的编辑距离近似
7. A Pseudo-Boolean programming approach for computing the breakpoint distance between two genomes with duplicate genes [O] . Angibaud Sébastien, Fertin Guillaume, Rusu Irena, 2007

机译：一种伪布尔编程方法，用于计算两个具有重复基因的基因组之间的断点距离

On Computing Breakpoint Distances for Genomes with Duplicate Genes

摘要

著录项

相似文献

相关主题

期刊订阅