首页>
外国专利>
METHODS FOR AUTOMATIC GENERATION OF PARALLEL CORPORA
METHODS FOR AUTOMATIC GENERATION OF PARALLEL CORPORA
展开▼
机译:自动生成并行公司的方法
展开▼
页面导航
摘要
著录项
相似文献
摘要
A method of forming parallel corpora comprises receiving sets of items in first language and second languages, each of the sets having one or more associated descriptions and metadata. The metadata is collected from the two sets of items and are aligned using the metadata. The aligned metadata are mapped from the first language to the second language for each of the sets. The descriptions of two items are fetched and the structural similarity of the descriptions is measured to assess whether two items are likely to be translations of each other. For mapped items with structurally similar descriptions, the mapped item descriptions are formed into respective sentences in first language and in the second language. The sentences are parallel corpora which may be used to translate an item from the first language to the second language, and also to train a machine translation system.
展开▼