We study a natural probabilistic model for motif discovery that has been used to experimentally test the quality of motif discovery programs. In this model, there are k background sequences, and each character in a background sequence is a random character from an alphabet Σ. A motif G = g_1g_2 ... g_m is a string of m characters. Each background sequence is implanted a randomly generated approximate copy of G. For a randomly generated approximate copy b_1b_2 … b_m of G, every character is randomly generated such that the probability for b_i ≠ g_i is at most α. In this paper, we give the first analytical proof that multiple background sequences do help for finding subtle and faint motifs.
展开▼