We consider the problem of matching a set P of gapped patterns against a given text T of length n, where a gapped pattern is a sequence of strings (keywords), over a finite alphabet Σ of size σ, such that there is a gap of fixed length between each two consecutive strings.We assume the RAM model, with words of size w in bits.We are interested in computing the list ofmatching patterns for each position in the text. This problem is a specific instance of the Variable Length Gaps problem [2] (VLG problem) for multiple patterns and has applications in the discovery of transcription factor (TF) binding sites in DNA sequences when using generalized versions of the PositionWeightMatrix (PWM) model to representTF binding specificities. The paper [5] describes howa motif represented as a generalizedPWM can bematched as a set of gapped patternswith unit-length keywords, and presents algorithms for the restricted case of patterns with two unit-length keywords.
展开▼