Although a word-based method is commonly used in document retrieval, it cannot be directly applicable to languages that have no obvious word separator. Given a lexicon, itis possible to identify words in documents, but a large lexicon is troublesome to maintain and makes retrieval systems large and complicated. This paper proposes an effective and efficient ranking that does not use a large lexicon; words need not be identified during document registration because a character-based signature file is used for the access structure. A user request, during document retrieval, is statistically analyzed to generate an appropriate query, and the query is evaluated efficiently in a word-based manner using the character-based index. We also propose two optimizing techniques to accelerate retrieval.
展开▼