首页> 美国卫生研究院文献>other >DNA-COMPACT: DNA COMpression Based on a Pattern-Aware Contextual Modeling Technique
【2h】

DNA-COMPACT: DNA COMpression Based on a Pattern-Aware Contextual Modeling Technique

机译:DNA-COMPACT:基于模式感知上下文建模技术的DNA压缩

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Genome data are becoming increasingly important for modern medicine. As the rate of increase in DNA sequencing outstrips the rate of increase in disk storage capacity, the storage and data transferring of large genome data are becoming important concerns for biomedical researchers. We propose a two-pass lossless genome compression algorithm, which highlights the synthesis of complementary contextual models, to improve the compression performance. The proposed framework could handle genome compression with and without reference sequences, and demonstrated performance advantages over best existing algorithms. The method for reference-free compression led to bit rates of 1.720 and 1.838 bits per base for bacteria and yeast, which were approximately 3.7% and 2.6% better than the state-of-the-art algorithms. Regarding performance with reference, we tested on the first Korean personal genome sequence data set, and our proposed method demonstrated a 189-fold compression rate, reducing the raw file size from 2986.8 MB to 15.8 MB at a comparable decompression cost with existing algorithms. DNAcompact is freely available at research purpose.
机译:基因组数据对于现代医学变得越来越重要。随着DNA测序的增长速度超过磁盘存储容量的增长速度,大型基因组数据的存储和数据传输已成为生物医学研究人员关注的重要问题。我们提出了一种两遍无损基因组压缩算法,该算法突出了互补上下文模型的合成,以提高压缩性能。所提出的框架可以处理带有或不带有参考序列的基因组压缩,并展示了优于现有最佳算法的性能优势。无参考压缩的方法导致细菌和酵母的比特率分别为1.720和1.838位/碱基,这比最新算法高约3.7%和2.6%。关于参考性能,我们在第一个韩国个人基因组序列数据集上进行了测试,我们提出的方法展示了189倍的压缩率,以与现有算法相当的解压缩成本将原始文件大小从2986.8 MB减少到15.8 MB。 DNAcompact可免费用于研究目的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号