...
首页> 外文期刊>Genome research >A privacy-preserving solution for compressed storage and selective retrieval of genomic data
【24h】

A privacy-preserving solution for compressed storage and selective retrieval of genomic data

机译:用于压缩存储和选择性检索基因组数据的隐私保护解决方案

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

In clinical genomics, the continuous evolution of bioinformatic algorithms and sequencing platforms makes it beneficial to store patients' complete aligned genomic data in addition to variant calls relative to a reference sequence. Due to the large size of human genome sequence data files (varying from 30 GB to 200 GB depending on coverage), two major challenges facing genomics laboratories are the costs of storage and the efficiency of the initial data processing. In addition, privacy of genomic data is becoming an increasingly serious concern, yet no standard data storage solutions exist that enable compression, encryption, and selective retrieval. Here we present a privacy-preserving solution named SECRAM (elective retrieval on Encrypted and Compressed Reference oriented Alignment Map) for the secure storage of compressed aligned genomic data. Our solution enables selective retrieval of encrypted data and improves the efficiency of downstream analysis (e.g., variant calling). Compared with BAM, the de facto standard for storing aligned genomic data, SECRAM uses 18% less storage. Compared with CRAM, one of the most compressed nonencrypted formats (using 34% less storage than BAM), SECRAM maintains efficient compression and downstream data processing, while allowing for unprecedented levels of security in genomic data storage. Compared with previous work, the distinguishing features of SECRAM are that (1) it is position-based instead of read-based, and (2) it allows random querying of a subregion from a BAM-like file in an encrypted form. Our method thus offers a space-saving, privacy-preserving, and effective solution for the storage of clinical genomic data.
机译:在临床基因组学中,生物信息学算法和测序平台的不断发展,除了相对于参考序列的变异调用外,还有利于存储患者完整的比对基因组数据。由于人类基因组序列数据文件的大小很大(取决于覆盖范围,从30 GB到200 GB),基因组学实验室面临的两个主要挑战是存储成本和初始数据处理的效率。另外,基因组数据的私密性正变得越来越严重,但是尚不存在能够进行压缩,加密和选择性检索的标准数据存储解决方案。在这里,我们提出了一个名为SECRAM(对加密和压缩参考导向的对准图进行选择性检索)的隐私保护解决方案,用于安全存储压缩的对准基因组数据。我们的解决方案可以选择性地检索加密数据,并提高下游分析(例如变体调用)的效率。与存储对齐基因组数据的事实上的标准BAM相比,SECRAM使用的存储量减少了18%。与压缩率最高的非加密格式之一的CRAM(使用的存储量相比BAM少34%)相比,SECRAM可以保持有效的压缩和下游数据处理,同时在基因组数据存储中提供前所未有的安全性。与以前的工作相比,SECRAM的显着特征是(1)基于位置而不是基于读取;(2)允许以加密形式从类似BAM的文件中随机查询子区域。因此,我们的方法为临床基因组数据的存储提供了节省空间,保护隐私和有效的解决方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号