Hypertext links are a powerful extension of standard information retrieval techniques based on query languages. However the generation of links is often impractical due to large manual and/or computational effort. We analyze the effects of two main approaches that aim at a restriction of the necessary efforts: the direct use of OCR-processed documents instead of manually post-processed, i.e. corrected documents; and the use of shorter excerpts of documents instead of complete documents. For our tests, similarity links were computed based on the vector-space model; the links that are generated based on unmodified OCR documents and excerpts of documents are then compared to those links that are generated based on complete documents without OCR errors.
展开▼