In this paper, we present two new algorithms for discovering monad patterns in DNA sequences. Monad patterns are of the form (l,d)-k, where l is the length of the pattern, d is the maximum number of mismatches allowed, and k is the minimum number of times the pattern is repeated in the given sample. The time-complexity of some of the best known algorithms to date is O(nt~2 l~d σ~d), where t is the number of input sequences, n is the length of each input sequence, and σ = |Σ| is the size of the alphabet. The first algorithm that we present in this paper takes O(n~2 t~2 l~(d/2)) time and O(ntl~(d/2) σ~(d/2)) space, and the second algorithm takes O(n~3 t~3 l~(d/2) σ~(d/2)) time using O(l~(d/2) σ~(d/2)) space. In practice, our algorithms have much better performance provided the d/l ratio is small. The second algorithm performs very well even for large values l and d as long as the d/l ratio is small.
展开▼