Methods and corresponding apparatus for analysing text in a document comprising a plurality of textual units, the method comprising: receiving the document; partitioning the text into sequences of textual units; comparing sequences from the document with pre-determined sequences from a sequence store; determining similarity measures dependent on differences between sequences from the document and sequences from the sequence store, the similarity measures being dependent on how many unit operations are required in order to make the sequences from the document the same as the sequences from the sequence store, updating a results store in respect of sequences having similarity measures indicative of degrees of similarity above a pre-determined threshold; and providing an output document comprising tags indicative of such similarities.
展开▼