This paper presents on-going researchon automatic extraction of bilinguallexicon from English-Japanese parallelcorpora. The main objective ofthis paper is to examine various Ngrammodels of generating translationunits for bilingual lexicon extraction.Three N-gram models, abaseline model (Bound-length N-gram)and two new models (Chunk-bound Ngramand Dependency-linked N-gram)are compared. An experiment with10000 English-Japanese parallel sentencesshows that Chunk-bound Ngramproduces the best result in termsof accuracy (83%) as well as coverage(60%) and it improves approximatelyby 13% in accuracy and by 5-9% incoverage from the previously proposedbaseline model.
展开▼