Document retrieval on natural languages with a rich morphology -- particularly in terms of derivation and (single-word) composition -- suffers from serious performance degradation with the direct query-term-to-text-word matching paradigm that underlies the vast majority of current search engines. We propose an alternative approach in which morphologically complex word forms, which appear in the query as well as in the documents, are segmented into relevant subwords (such as stems, named entities, acronyms) and are subsequently submitted to the matching procedure. We evaluate our approach with the AltaVista{sup}TM Search Engine on a large medical document collection.
展开▼