De Bruijn graphs play a key role in many bioinformatics tools as a data structure to efficiently represent the overlap between sequences. Given a set of sequences S, the de Bruijn graph’s nodes are defined by the k-mers (subsequences of length k) present in S. Two nodes u and v are connected by a directed arc when a k+1-mer exists in S for which the first k nucleotides coincide with u and the last k nucleotides coincide with v [1]. Often, linear (i.e. non-branching) chains of nodes are contracted into a single node referred to as a unitig [2].
展开▼