In this paper, we first propose coding techniques for DNA-based data storage which account the maximum homopolymer runlength and the GC-content. In particular, for arbitrary ℓ,ϵ>0, we propose simple and efficient (ℓ,ϵ)-constrained encoders that transform binary sequences into DNA base sequences (codewords), that satisfy the following properties:• Runlength constraint: the maximum homopolymer run in each codeword is at most ℓ,• GC-content constraint: the GC-content of each codeword is within [0.5−ϵ, 0.5+ϵ].For practical values of ℓ and ϵ, our codes achieve higher rates than the existing results in the literature. We further design efficient (ℓ,ϵ)-constrained codes with error-correction capability. Specifically, the designed codes satisfy the runlength constraint, the GC-content constraint, and can correct a single edit (i.e. a single deletion, insertion, or substitution) and its variants. To the best of our knowledge, no such codes are constructed prior to this work.
展开▼