As the cost of human full genome sequencing continues to fall, we will soon witness a prodigious amount of human genomic data in the public cloud. To protect the confidentiality of the genetic information of individuals, the data has to be encrypted at rest. On the other hand, encryption severely hinders the use of this valuable information, such as Genome-wide Range Query (GRQ), in medical/genomic research. While the problem of secure range query on outsourced encrypted data has been extensively studied, the current schemes are far from practical deployment in terms of efficiency and scalability due to the data volume in human genome sequencing. In this paper, we investigate the problem of secure GRQ over human raw aligned genomic data in a third-party outsourcing model. Our solution contains a novel secure range query scheme based on multi-keyword symmetric searchable encryption (MSSE). The proposed scheme incurs minimal ciphertext expansion and computation overhead. We also present a hierarchical GRQ-oriented secure index structure tailored for efficient and large-scale genomic data lookup in the cloud while preserving the query privacy. Our experiment on real human genomic data shows that a secure GRQ request with range size 100,000 over more than 300 million encrypted short reads takes less than 3 minutes, which is orders of magnitude faster than existing solutions.
展开▼