首页> 外文会议>IEEE International Parallel and Distributed Processing Symposium >Towards Context-Aware DNA Sequence Compression for Efficient Data Exchange
【24h】

Towards Context-Aware DNA Sequence Compression for Efficient Data Exchange

机译:朝着上下文感知DNA序列压缩以获得有效数据交换

获取原文

摘要

DNA sequencing has emerged as one of the principal research directions in systems biology because of its usefulness in predicting the provenance of disease but also has profound impact in other fields like biotechnology, biological systematic and forensic medicine. The experiments in high throughput DNA sequencing technology are notorious for generating DNA sequences in huge quantities, and this poses a challenge in the computation, storage and exchange of sequence data. Computing on the Cloud helps mitigate the first two challenges because it gives on-demand machines through which we are able to save cost and it gives flexibility to balance the load, both computation- and storage-wise. The problem with data exchange could be mitigated to an extent through the use of data compression. This work proposes a context-aware framework that decides the compression algorithm which can minimize the time-to-completion and efficiently utilize the resources by performing experiments on different Cloud and algorithm combinations and configurations. The results obtained from this framework and experimental setup shows that DNAX is better than rest of the algorithms in any context, but if the file size is less than 50kb then one can go for CTW or Gencompress. The Gzip algorithm which is used in the NCBI repository to store the sequences has the worst compression ratio and time.
机译:DNA测序作为系统生物学中的主要研究方向之一,由于其在预测疾病的出处,但在生物技术,生物系统和法医等其他领域也产生了深远的影响。在高通量DNA测序技术中的实验对于以大量产生DNA序列是臭名昭着的,这在序列数据的计算,存储和交换中存在挑战。在云上计算有助于减轻前两个挑战,因为它提供了我们能够节省成本的按需机器,并且它可以灵活地平衡负载,包括计算和存储。可以通过使用数据压缩来减轻数据交换的问题。这项工作提出了一种上下文感知框架,其决定压缩算法可以通过在不同云和算法组合和配置上执行实验来最小化完成时间和有效利用资源。从该框架和实验设置获得的结果表明,DNAX比任何上下文中的算法的其余部分都更好,但如果文件大小小于50kB,则可以访问CTW或Gencompress。用于存储序列的NCBI存储库中使用的GZIP算法具有最差的压缩比和时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号