Dictionary-based translation is a traditional approach in use by cross-languageudinformation retrieval systems. However, significant performance degradation isudoften observed when queries contain words that do not appear in the dictionary.udThis is called the Out of Vocabulary (OOV) problem. In recent years, Web miningudhas been shown to be one of the effective approaches for solving this problem.udHowever, the questions of how to extract Multiword Lexical Units (MLUs) fromudthe Web content and how to select the correct translations from the extractedudcandidate MLUs are still two difficult problems in Web mining based automatedudtranslation approaches.udMost statistical approaches to MLU extraction rely on statistical informationudextracted from huge corpora. In the case of using Web mining techniques forudautomated translations, these approaches do not perform well because the size ofudthe corpus is usually too small and statistical approaches that rely on a large sampleudcan become unreliable. In this paper, we present a new Chinese term measurementudand a new Chinese MLU extraction process that work well on small corpora. Weudalso present our approach to the selection of MLUs in a more accurate manner. Ourudexperiments show marked improvement in translation accuracy over otherudcommonly used approaches.
展开▼