In this paper, we explore a CLIR based approach to constr ct large scale Chi nese English comparable corpora, which is val able for translation knowledge mining. The initial so rce and target doc ment sets are crawled from news website and standardized niformly. Keywords are extracted from the so rce doc ment firstly, and then the extracted keywords are translated and combined as q ery words thro gh certain criteria to retrieve against the index created sing target doc ment set. Meanwhile, the mapping correlations between so rce and target doc ments are developed accord ing to the val e of similarity calc lated by the retrieval tool. Two methods are eval ated to filter the comparable doc ment pairs so as to ens re the q ality of the comparable corpora. Experimental re s lts indicate that o r approach is effec tive on the constr ction of Chinese English comparable corpora.
展开▼