首页> 外文会议>Annual International Conference on Research in Computational Molecular Biology >Efficient Calculation of Interval Scores for DNA Copy Number Data Analysis
【24h】

Efficient Calculation of Interval Scores for DNA Copy Number Data Analysis

机译:用于DNA拷贝数数据分析的间隔分数的高效计算

获取原文

摘要

Background. DNA amplifications and deletions characterize cancer genome and are often related to disease evolution. Microarray based techniques for measuring these DNA copy-number changes use fluorescence ratios at arrayed DNA elements (BACs, cDNA oroligonucleotides) to provide signals at high resolution, in terms of genomic locations. These data are then further analyzed to map aberrations and boundaries and identify biologically significant structures. Methods. We develop a statistical framework that enables the casting of several DNA copy number data analysis questions as optimization problems over real valued vectors of signals. The simplest form of the optimization problem seeks to maximize over all subintervals / in the input vector. We present and prove a linear time approximation scheme for this problem. Namely, a process with time complexity O (ne~(-2)) that outputs an interval for which (p(I) is at least Opt/a(e), where Opt is the actual optimum and a(e) —> 1 as e —> 0. We further develop practical implementations that improve the performance of the naive quadratic approach by orders of magnitude. We discuss properties of optimal intervals and how they apply to the algorithm performance. Examples. We benchmark our algorithms on synthetic as well as publicly available DNA copy number data. We demonstrate the use of these methods for identifying aberrations in single samples as well as common alterations in fixed sets and subsets of breast cancer samples.
机译:背景。 DNA扩增和缺失表征癌症基因组,通常与疾病演化有关。用于测量这些DNA拷贝数的微阵列的技术改变使用阵列的DNA元件(BACS,cDNA oroligonatorace)的荧光比以基因组位置在高分辨率下提供信号。然后进一步分析这些数据以映射像差和边界并识别生物学上有显着的结构。方法。我们开发了一个统计框架,使得能够将几个DNA拷贝数数据分析问题的铸造成为现实值的信号的优化问题。最简单的优化问题的形式旨在最大化所有子内部/在输入向量中。我们展示并证明了这个问题的线性时间近似方案。即,具有输出间隔的时间复杂度O(NE〜(-2))的过程(P(i)至少选择/ a(e),其中选择是实际的最佳和a(e) - > 1作为E - > 0.我们进一步开发了实用的实现,以通过数量级来提高天真二次方法的性能。我们讨论了最佳间隔的属性以及它们如何适用于算法性能。示例。我们将我们的算法基于合成的算法基准。以及可公开可用的DNA拷贝数数据。我们证明了使用这些方法来识别单个样本中的像差以及固定组和乳腺癌样品的亚群的常见变化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号