Parallel corpus has recently become an indispensable resource in multilingual natural language processing. Manual preparation of a bilingual corpus is a laborious task. Therefore methods for the automated creation of parallel corpus are currently a topic of concern for many researchers. A number of sophisticated and effective algorithms for collecting parallel texts from the Web have already been created. Unfortunately, none of them have been used in the process of Polish-German corpus creation. That is why the aim of the research has been to verify the efficiency of existing algorithms for the collecting of Polish-German parallel corpus, intended as a reference source for a Machine Translation system, to propose a new algorithm and present results achieved by the new algorithm.
展开▼