Pattern discovery, or the search for frequently occurring subsequences (called sequential patterns) in sequences, is a well-known data-mining task. Sequences of events occur naturally in many domains. We address and abstract version of the problem of finding frequent sequences of page accesses in a log file by considering the problem of finding frequent subsequences in a sequence dataset. In the abstract problem, we use the 26 uppercase letters to represent the possible web pages, and examine the problem of finding frequently occurring subsequences of items in a very long sequence. The particular problem studied is to find all frequently occurring substrings of length K or less in a very long string. The advantage of Heuristic Depth-first (HDF) algorithm based on the Depth-First (DF) algorithm is explained by comparing with Breadth-First (BF) algorithm.
展开▼