首页> 外文会议>Asia-Pacific Signal and Information Processing Association Annual Summit and Conference >Compressing population DNA sequences using multiple reference sequences
【24h】

Compressing population DNA sequences using multiple reference sequences

机译:使用多个参考序列压缩群体DNA序列

获取原文

摘要

Compressing population DNA sequences often relies on the use of a reference sequence so that only the differences between the target DNA sequences to be compressed and the reference sequence are encoded. Despite the importance of the choice of the reference sequence, state-of-the-art algorithms in population sequence compression often selected one of the population sequences as a reference sequence in an ad hoc manner. In this paper, we investigated issues about the choice of the reference sequence. In particular, population sequences are first clustered into a number of groups. A reference sequence is then obtained for each group so that substructures within each group can be characterized by this reference sequence. Afterwards, the reference sequence is used to compress sequences within that group. In this way, the multiple reference sequences framework can optimize the overall compression performance on the set of population sequences. Results show that our proposed method reduces the compressed size by up to 91% as compared to state-of-the-art reference- based approaches.
机译:压缩群体DNA序列通常依赖于参考序列的使用,从而仅编码要压缩的靶DNA序列和参考序列之间的差异。尽管选择参考序列很重要,但是总体序列压缩中的最新算法经常以自组织方式选择总体序列之一作为参考序列。在本文中,我们研究了有关参考序列选择的问题。具体而言,首先将种群序列聚类为多个组。然后为每个组获取参考序列,以便可以通过该参考序列来表征每个组内的子结构。之后,参考序列用于压缩该组中的序列。这样,多个参考序列框架可以优化总体序列集上的整体压缩性能。结果表明,与最新的基于参考的方法相比,我们提出的方法可将压缩大小减少多达91%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号