首页> 外文期刊>Bioinformatics >G-SQZ: compact encoding of genomic sequence and quality data.
【24h】

G-SQZ: compact encoding of genomic sequence and quality data.

机译:G-SQZ:基因组序列和质量数据的紧凑编码。

获取原文
获取原文并翻译 | 示例
           

摘要

SUMMARY: Large volumes of data generated by high-throughput sequencing instruments present non-trivial challenges in data storage, content access and transfer. We present G-SQZ, a Huffman coding-based sequencing-reads-specific representation scheme that compresses data without altering the relative order. G-SQZ has achieved from 65% to 81% compression on benchmark datasets, and it allows selective access without scanning and decoding from start. This article focuses on describing the underlying encoding scheme and its software implementation, and a more theoretical problem of optimal compression is out of scope. The immediate practical benefits include reduced infrastructure and informatics costs in managing and analyzing large sequencing data. AVAILABILITY: http://public.tgen.org/sqz. Academic/non-profit: Source: available at no cost under a non-open-source license by requesting from the web-site; Binary: available for direct download at no cost. For-Profit: Submit request for for-profit license from the web-site.
机译:简介:高通量测序仪器生成的大量数据对数据存储,内容访问和传输提出了不小的挑战。我们提出了G-SQZ,这是一种基于霍夫曼编码的基于序列读取的特定表示形式,可以在不改变相对顺序的情况下压缩数据。 G-SQZ在基准数据集上实现了65%到81%的压缩,并且允许选择性访问而无需从头开始进行扫描和解码。本文着重介绍基本的编码方案及其软件实现,而更理论上的最佳压缩问题不在范围之内。直接的实际好处包括减少管理和分析大型测序数据的基础架构和信息学成本。可用性:http://public.tgen.org/sqz。学术/非营利组织:来源:根据网站的要求,根据非开源许可证免费提供;二进制:可直接免费下载。营利性:从网站提交营利性许可请求。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号