Generating English-Persian Parallel Corpus Using an Automatic Anchor Finding Sentence Aligner

机译：使用自动锚查找句对齐器生成英语-波斯语平行语料库

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The more we can enlarge a parallel bilingual corpus, the more we have made it effective and powerful. Providing such corpora demands special efforts both in seeking for as much already translated texts as possible and also in designing appropriate sentence alignment algorithms with as less time complexity as possible. In this paper, we propose algorithms for sentence aligning of two Persian-English texts in linear time complexity and with a surprisingly high accuracy. This linear time-complexity is achieved through our new languageindependent anchor finding algorithm which enables us to align as a big parallel text as a whole book in a single attempt and with a high accuracy. As far as we know, this project is the first automatic construction of an English-Persian parallel sentencelevel corpus.

机译：我们越能扩大平行的双语语料库，我们就越能使它有效和强大。提供这样的语料库需要特别的努力，既要寻找尽可能多的已翻译的文本，又要以尽可能少的时间复杂度来设计合适的句子对齐算法。在本文中，我们提出了用于线性时间复杂度和令人惊讶的高准确性的两个波斯英语文本的句子对齐算法。这种线性的时间复杂性是通过我们新的独立于语言的锚点查找算法实现的，该算法使我们能够在一次尝试中以高准确度将整本书作为大平行文本对齐。据我们所知，该项目是英语-波斯语并行句子级语料库的首次自动构建。

著录项

来源
《Proceedings of the 6th international conference on natural language processing and knowledge engineering. 》|2010年|p.1-6|共6页
会议地点 Beijing(CN);Beijing(CN)
作者
Meisam VOSOUGHPOUR YAZDCHI; Heshaam FAILI;
展开▼
作者单位

School of Electrical and Computer Engineering, College of Engineering, University of Tehran;

School of Electrical and Computer Engineering,College of Engineering, University of Tehran;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算机软件 ; 计算机软件 ; 计算机软件 ;
关键词

相似文献

外文文献
中文文献
专利

1. Learning English-Chinese bilingual word representations from sentence-aligned parallel corpus [J] . Yen An-Zi, Huang Hen-Hsen, Chen Hsin-Hsi Computer speech and language . 2019 ,第JULa期

机译：从句子对齐的平行语料库中学习英汉双语单词表示
2. Learning English-Chinese bilingual word representations from sentence-aligned parallel corpus [J] . Yen An-Zi, Huang Hen-Hsen, Chen Hsin-Hsi Computer speech and language . 2019 ,第Jula期

机译：从句子对齐的并行语料库学习英语 - 中文双语词表示
3. FINDING SCHWA: COMPARING THE RESULTS OF AN AUTOMATIC ALIGNER WITH HUMAN JUDGMENTS WHEN IDENTIFYING SCHWA IN A CORPUS OF SPOKEN FRENCH [J] . Peter M Milne Canadian acoustics . 2011 ,第3期

机译：查找SCHWA：在口语语料库中识别SCHWA时，比较自动算子和人为判断的结果
4. Generating english-persian parallel corpus using an automatic anchor finding sentence aligner [C] . Vosoughpour Yazdchi Meisam, Faili Heshaam International Conference on Natural Language Processing and Knowledge Engineering . 2010

机译：使用自动锚点查找句子对齐器生成英语-波斯语平行语料库
5. Automatic Text Summarization Using Importance of Sentences for Email Corpus. [D] . Nadella, Sravan. 2015

机译：使用句子对电子邮件语料库的重要性自动进行文本摘要。
6. Extracting Parallel Sentences from Nonparallel Corpora Using Parallel Hierarchical Attention Network [O] . Shaolin Zhu, Yong Yang, Chun Xu 2020

机译：使用并行分层注意网络从非平行语料库中提取并行句子
7. The introduction of criteria for assessing an aligned parallel Persian-English corpus at the sentence level [O] . Masoumeh Mashayekhi, Morteza Analoui, Behrouz Minaei Bidgoli 2019

机译：在句子级别评估对齐的平行波斯英语语料库的标准

Generating English-Persian Parallel Corpus Using an Automatic Anchor Finding Sentence Aligner

摘要

著录项

相似文献

相关主题

期刊订阅