Bilingual lexicon extraction from comparable corpora is constrained by the small amount of available data when dealing with specialized domains. This aspect penalizes the performance of distributional-based approaches, which is closely related to the reliability of word's cooccurrence counts extracted from comparable corpora. A solution to avoid this limitation is to associate external resources with the comparable corpus. Since bilingual word embeddings have recently shown efficient models for learning bilingual distributed representation of words, we explore different word embedding models and show how a general-domain comparable corpus can enrich a specialized comparable corpus via neural networks.
展开▼