...
首页> 外文期刊>BMC Bioinformatics >An efficient pseudomedian filter for tiling microrrays
【24h】

An efficient pseudomedian filter for tiling microrrays

机译:用于平铺微阵列的高效伪中值滤波器

获取原文
           

摘要

Background Tiling microarrays are becoming an essential technology in the functional genomics toolbox. They have been applied to the tasks of novel transcript identification, elucidation of transcription factor binding sites, detection of methylated DNA and several other applications in several model organisms. These experiments are being conducted at increasingly finer resolutions as the microarray technology enjoys increasingly greater feature densities. The increased densities naturally lead to increased data analysis requirements. Specifically, the most widely employed algorithm for tiling array analysis involves smoothing observed signals by computing pseudomedians within sliding windows, a O ( n 2log n ) calculation in each window. This poor time complexity is an issue for tiling array analysis and could prove to be a real bottleneck as tiling microarray experiments become grander in scope and finer in resolution. Results We therefore implemented Monahan's HLQEST algorithm that reduces the runtime complexity for computing the pseudomedian of n numbers to O ( n log n ) from O ( n 2log n ). For a representative tiling microarray dataset, this modification reduced the smoothing procedure's runtime by nearly 90%. We then leveraged the fact that elements within sliding windows remain largely unchanged in overlapping windows (as one slides across genomic space) to further reduce computation by an additional 43%. This was achieved by the application of skip lists to maintaining a sorted list of values from window to window. This sorted list could be maintained with simple O (log n ) inserts and deletes. We illustrate the favorable scaling properties of our algorithms with both time complexity analysis and benchmarking on synthetic datasets. Conclusion Tiling microarray analyses that rely upon a sliding window pseudomedian calculation can require many hours of computation. We have eased this requirement significantly by implementing efficient algorithms that scale well with genomic feature density. This result not only speeds the current standard analyses, but also makes possible ones where many iterations of the filter may be required, such as might be required in a bootstrap or parameter estimation setting. Source code and executables are available at http://tiling.gersteinlab.org/pseudomedian/ .
机译:背景图案平铺微阵列正在成为功能基因组学工具箱中的基本技术。它们已应用于新的转录物鉴定,转录因子结合位点的任务,检测甲基化DNA和几种模型生物中的其他几种应用。由于微阵列技术享有越来越大的特征密度,这些实验正在越来越多的分辨率下进行。增加的密度自然导致数据分析要求增加。具体地,用于平铺阵列分析的最广泛采用的算法涉及通过计算滑动窗口内的伪影片,在每个窗口中计算伪影像店,通过计算伪影像人来平滑观察到的信号。这种糟糕的时间复杂性是平铺阵列分析的问题,并且可以证明是一个真正的瓶颈,因为平铺微阵列实验在分辨率范围内变得更加壮大。结果我们实施了Monahan的HLQEST算法,这减少了从O(n 2 log n)计算n个数字的伪影像伪组件的运行时复杂度。对于代表性的TILING MicroArray数据集,该修改将平滑过程的运行时间减少了近90%。然后,我们利用了滑动窗口内的元素在重叠窗口(作为基因组空间的一个幻灯片)中保持不变,以进一步减少43%的计算。这是通过应用跳过列表来实现,以将来自窗口的排序列表维护到窗口。可以使用简单的O(log n)插入和删除来维护此排序列表。我们说明了我们对合成数据集的时间复杂性分析和基准测试的算法的有利缩放属性。结论依赖于滑动窗口伪数计算的平铺微阵列分析可能需要数小时的计算。通过实现与基因组特征密度良好的高效算法,我们已经显着地缓解了这一要求。这一结果不仅速度速度速度,而且还可以使可能需要滤波器许多迭代的可能性,例如在引导程序或参数估计设置中可能需要。源代码和可执行文件可用于http://tiling.gersteinlab.org/pseudomedian/。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号