Compiling and Filtering Parlce: An English-Icelandic Parallel Corpus

机译：编译和过滤PARLCE：英国冰岛并行语料库

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present Parlce, a new English-Icelandic parallel corpus. This is the first parallel corpus built for the purposes of language technology development and research for Icelandic, although some Icelandic texts can be found in various other multilingual parallel corpora. We map which Icelandic texts are available for these purposes, collect and filter aligned data, align other bilingual texts we acquired and describe the alignment and filtering processes. After filtering, our corpus includes 39 million Icelandic words in 3.5 million segment pairs. We estimate that our filtering process reduced the number of faulty segments in the corpus by more than 60% while only reducing the number of good alignments by approximately 9%.

机译：我们提出了一个新的英国冰岛并行语料库。这是第一个为冰岛语言开发和研究而建立的第一个并行语料库，尽管可以在各种其他多语种平行语料库中找到一些冰岛文本。我们映射哪些冰岛文本可用于这些目的，收集和过滤对齐数据，对齐我们获取的其他双语文本并描述对齐和过滤过程。过滤后，我们的语料库包括3900万冰岛单词，在350万段对。我们估计我们的过滤过程将语料库中的故障段数减少超过60％，同时仅将良好对准的数量减少约9％。

著录项

来源
《Nordic conference of computational Linguistics》|2019年|xx 410 p.|共6页
会议地点
作者
Starkadur Barkarson; Steintor Steingrimsson;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词
入库时间 2022-08-20 20:19:27

相似文献

外文文献
中文文献
专利

1. On Difficulties of Compiling Parallel Corpus of Socio-Political Terms [J] . Svetlana Manik Procedia - Social and Behavioral Sciences . 2015,第2期

机译：社会政治术语平行语料库的编写难点
2. Structure of parallelizing compiler toolkit for efficient construction of parallelizing compilers [J] . Koichi Asakura, Toyohide Watanabe 電子情報通信学会技術研究報告. コンピュテ-ション. Theoretical Foundations of Computing . 2000,第144期

机译：并行编译器工具包的结构，可有效构建并行编译器
3. Structure of parallelizing compiler toolkit for efficient construction of parallelizing compilers [J] . Koichi Asakura, Toyohide Watanabe 電子情報通信学会技術研究報告. コンピュテ-ション. Theoretical Foundations of Computing . 2000,第144期

机译：并行编译器工具包的结构，可有效构建并行编译器
4. Compiling and Filtering Parlce: An English-Icelandic Parallel Corpus [C] . Starkadur Barkarson, Steinþor Steingrimsson Nordic conference of computational Linguistics . 2019

机译：编译和过滤Parlce：英语-冰岛语平行语料库
5. Analyse comparative de l'equivalence terminologique en corpus parallele et en corpus comparable: Application au domaine du changement climatique. [D] . Le Serrec, Annaich. 2012

机译：平行语料库和可比语料库中术语等效性的比较分析：在气候变化领域中的应用。
6. Pseudotext Injection and Advance Filtering of Low-Resource Corpus for Neural Machine Translation [O] . Michael Adjeisah, Guohua Liu, Douglas Omwenga Nyabuga, 2021

机译：神经电机翻译低资源语料的假义注射和预先滤波
7. NICT’s Corpus Filtering Systems for the WMT18 Parallel Corpus Filtering Task [O] . Rui Wang, Benjamin Marie, Masao Utiyama, 2018

机译：关于WMT18并行语料库过滤任务的Nict的语料库过滤系统

Compiling and Filtering Parlce: An English-Icelandic Parallel Corpus

摘要

著录项

相似文献

相关主题

期刊订阅