DNA data embedding is a newly emerging field aspiring to encode data in deoxyribonucleic acid (DNA). DNA is an inherently digital and noisy medium, undergoing substitution, insertion and deletion mutations. Hence, encoding information in DNA can be seen as a particular case of digital communications in which biological constraints must be observed. In this paper we propose a modification of Davey and MacKay's watermark synchronisation code (unrelated to digital watermarking) to create an encoding procedure more biocompatible with the host organism than previous methods. In addition, when combined with a low density parity check (LDPC) code, the method provides near-optimum error correction. We also obtain the theoretical embedding capacity of DNA under substitution mutations for the increased biocompatibility constraint. This result, along with an existing bound on capacity for insertion and deletion mutations, is compared to the proposed algorithm's performance by means of Monte Carlo simulations.
展开▼