【24h】

RNA Transcript Assembly Using Inexact Flows

机译:使用不精确流的RNA转录组装

获取原文

摘要

RNA-Seq technology allows for high-throughput, low cost measurement of gene expression. An important step in this process is the assembly of mRNA transcript short reads into full transcripts. The problem can be viewed as a flow decomposition problem in which the objective is to minimize the number of path flows needed to represent a given flow. In this work we relax the edge flow constraints to allow for some uncertainty in their measurement. We formulate this as the Inexact Flow Decomposition problem and propose an algorithmic strategy to solve it. In practice, real biological data has measurement errors and so experimentally-derived edge-weighted splice graphs are often not flows. The proposed method is the first approach to this problem that explicitly controls the error allowed on each edge in these graphs in order to achieve a flow. In an intermediate step, the method solves an exact flow decomposition instance; if a greedy method is used for this step, the overall running time is O(|E|2|V|2+|P|3), where P is the solution found to the flow decomposition instance. Preliminary results on simulated biological data sets show that in many cases the ground truth paths can be recovered at approximately correct abundances, even with noisy input data.
机译:RNA-Seq技术可实现高通量,低成本的基因表达测量。此过程中的重要一步是将mRNA转录短读片段组装成完整的转录本。该问题可以看作是流分解问题,其目的是使表示给定流所需的路径流数量最小化。在这项工作中,我们放宽了边缘流约束,以允许其测量中存在一些不确定性。我们将此公式表述为不精确流分解问题,并提出了一种算法来解决该问题。在实践中,实际的生物学数据具有测量误差,因此,实验得出的边缘加权拼接图通常不会流动。所提出的方法是解决该问题的第一种方法,该方法显式控制这些图中每个边上允许的误差以实现流程。在中间步骤中,该方法解决了精确的流量分解实例;如果此步骤使用贪婪方法,则总运行时间为O(| E | 2 | V | 2 + | P | 3 ),其中P是流分解实例的解。对模拟生物数据集的初步结果表明,即使在嘈杂的输入数据中,在许多情况下,地面真相路径仍可以以大约正确的丰度进行恢复。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号