首页> 外文会议>International workshop on algorithms in bioinformatics >Efficient Design of Compact Unstructured RNA Libraries Covering All k-mers
【24h】

Efficient Design of Compact Unstructured RNA Libraries Covering All k-mers

机译:涵盖所有k-mers的紧凑型非结构化RNA文库的高效设计

获取原文
获取外文期刊封面目录资料

摘要

Current microarray technologies to determine RNA structure or measure protein-RNA interactions rely on single-stranded, unstructured RNA probes on a chip covering together all k-mers. Since space on the array is limited, the problem is to efficiently design a compact library of unstructured ℓ-long RNA probes, where each k-mer is covered at least p times. Ray et al. designed such a library for specific values of k, ℓ and p using ad-hoc rules. To our knowledge, there is no general method to date to solve this problem. Here, we address the problem of finding a minimum-size covering of all k-mers by ℓ-long sequences with the desired properties for any value of k, ℓ and p. As we prove that the problem is NP-hard, we give two solutions: the first is a greedy algorithm with a logarithmic approximation ratio; the second, a heuristic greedy approach based on random walks in de Bruijn graphs. The heuristic algorithm works well in practice and produces a library of unstructured RNA probes that is only ~ 1.1-times greater in size compared to the theoretical lower bound. We present results for typical values of k and probe lengths ℓ and show that our algorithm generates a library that is significantly smaller than the library of Ray et al.; moreover, we show that our algorithm outperforms naive methods. Our approach can be generalized and extended to generate RNA or DNA oligo libraries with other desired properties. The software is freely available on curlcake.csail.mit.edu.
机译:当前用于确定RNA结构或测量蛋白质-RNA相互作用的微阵列技术依赖于覆盖所有k-mers的芯片上的单链,非结构化RNA探针。由于阵列上的空间有限,问题在于有效设计紧凑的无结构λ长RNA探针文库,其中每个k-mer至少覆盖p次。雷等。设计了一个使用ad-hoc规则的k,ℓ和p特定值的库。据我们所知,迄今为止尚无通用的方法来解决此问题。在这里,我们解决的问题是,通过对所有k 、,和p值都具有所需属性的long-长序列,找到所有k-mers的最小尺寸覆盖。当我们证明问题是NP问题时,我们给出两个解决方案:第一个是对数近似比的贪婪算法;第二个是对数算法。第二种是基于de Bruijn图中随机游动的启发式贪婪方法。该启发式算法在实践中效果很好,并且生成了一个非结构化RNA探针库,其大小仅比理论下限大了约1.1倍。我们给出了k和探针长度typical的典型值的结果,并表明我们的算法生成的库明显小于Ray等人的库。此外,我们证明了我们的算法优于单纯的方法。我们的方法可以推广并扩展为生成具有其他所需属性的RNA或DNA寡核苷酸文库。该软件可在curlcake.csail.mit.edu上免费获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号