This paper focuses on the 2-Interval pattern matching problem for {<, {is contained in}}-structured pattern and applies it on scanning for the ncRNAs without pseudoknots. Vialette [6] gave an O(mn~3 log n) time solution to the problem, where m, n are the number of intervals in the pattern and the given 2-interval set. This solution however is not practical for scanning the secondary structure in a genome-wide or chromosome-wide scale. In this paper, we propose an efficient algorithm to solve the problem in O(mn log n) time. In order to capture more characteristics of the secondary structures of ncRNA families, we define a new problem by considering the distance constraints between the intervals and we can still solve it without increasing the time complexity. Experiment showed that the method to the new defined problem can result in much fewer false positives. Moreover, if we assume the only possible base pairs are { (A,U), (C,G), (U,G)} which are the case for RNA molecule, we can further improve the time complexity to O(m q), where q is the length of the input RNA sequences. From the experiment, our new method requires a reasonable time (2.5 min) to scan the whole chromosome for an ncRNA family.
展开▼