首页> 外文期刊>Cloud Computing, IEEE Transactions on >Securing Aggregate Queries for DNA Databases
【24h】

Securing Aggregate Queries for DNA Databases

机译:保护DNA数据库的总体查询

获取原文
获取原文并翻译 | 示例
       

摘要

This paper addresses the problem of sharing person-specific genomic sequences without violating the privacy of their data subjects to support large-scale biomedical research projects. The proposed method builds on the framework proposed by Kantarcioglu et al. [1] but extends the results in a number of ways. One improvement is that our scheme is deterministic, with zero probability of a wrong answer (as opposed to a low probability). We also provide a new operating point in the space-time tradeoff, by offering a scheme that is twice as fast as theirs but uses twice the storage space. This point is motivated by the fact that storage is cheaper than computation in current cloud computing pricing plans. Moreover, our encoding of the data makes it possible for us to handle a richer set of queries than exact matching between the query and each sequence of the database, including: (i) counting the number of matches between the query symbols and a sequence; (ii) logical OR matches where a query symbol is allowed to match a subset of the alphabet thereby making it possible to handle (as a special case) a "not equal to" requirement for a query symbol (e.g., "not a G"); (iii) support for the extended alphabet of nucleotide base codes that encompasses ambiguities in DNA sequences (this happens on the DNA sequence side instead of the query side); (iv) queries that specify the number of occurrences of each kind of symbol in the specified sequence positions (e.g., two 'A' and four 'C' and one 'G' and three 'T', occurring in any order in the query-specified sequence positions); (v) a threshold query whose answer is 'yes' if the number of matches exceeds a query-specified threshold (e.g., "7 or more matches out of the 15 query-specified positions"). (vi) For all query types, we can hide the answers from the decrypting server, so that only the client learns the answer. (vii) In all cases, the client deterministically learns only the query's answer, except for query type (v) where we quantify the (very small) statistical leakage to the client of the actual count.
机译:本文解决了共享特定于人的基因组序列而又不破坏其数据主体的隐私以支持大规模生物医学研究项目的问题。所提出的方法基于Kantarcioglu等人提出的框架。 [1]但以多种方式扩展了结果。一种改进是我们的方案是确定性的,错误答案的可能性为零(相对于低概率)。我们还提供了一种时空权衡的新工作点,该方案的速度是其时速的两倍,但使用的存储空间却是其两倍。这一点是因为在当前的云计算定价计划中,存储比计算便宜。而且,我们对数据的编码使我们有可能处理比查询和数据库的每个序列之间的精确匹配更丰富的查询集合,包括:(i)计算查询符号和序列之间的匹配数目; (ii)逻辑OR匹配,其中允许查询符号匹配字母的子集,从而可以处理(作为特殊情况)查询符号的“不等于”要求(例如,“非G” ); (iii)支持核苷酸碱基代码的扩展字母,该字母涵盖了DNA序列中的歧义(这发生在DNA序列一侧而不是查询一侧); (iv)指定在指定序列位置每种符号出现次数的查询(例如,两个“ A”和四个“ C”以及一个“ G”和三个“ T”,以任意顺序出现在查询中) -指定的序列位置); (v)如果匹配数超过查询指定的阈值(例如,“ 15个查询指定位置中有7个或更多匹配项”),则答案为“是”的阈值查询。 (vi)对于所有查询类型,我们都可以从解密服务器中隐藏答案,以便只有客户端才能了解答案。 (vii)在所有情况下,客户都只能确定性地学习查询的答案,但查询类型(v)除外,在这里我们量化了实际计数对客户的(很小)统计泄漏。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号