首页> 外文会议>Proceedings of the 6th international conference on natural language processing and knowledge engineering. >Generating English-Persian Parallel Corpus Using an Automatic Anchor Finding Sentence Aligner
【24h】

Generating English-Persian Parallel Corpus Using an Automatic Anchor Finding Sentence Aligner

机译:使用自动锚查找句对齐器生成英语-波斯语平行语料库

获取原文
获取原文并翻译 | 示例

摘要

The more we can enlarge a parallel bilingual corpus, the more we have made it effective and powerful. Providing such corpora demands special efforts both in seeking for as much already translated texts as possible and also in designing appropriate sentence alignment algorithms with as less time complexity as possible. In this paper, we propose algorithms for sentence aligning of two Persian-English texts in linear time complexity and with a surprisingly high accuracy. This linear time-complexity is achieved through our new languageindependent anchor finding algorithm which enables us to align as a big parallel text as a whole book in a single attempt and with a high accuracy. As far as we know, this project is the first automatic construction of an English-Persian parallel sentencelevel corpus.
机译:我们越能扩大平行的双语语料库,我们就越能使它有效和强大。提供这样的语料库需要特别的努力,既要寻找尽可能多的已翻译的文本,又要以尽可能少的时间复杂度来设计合适的句子对齐算法。在本文中,我们提出了用于线性时间复杂度和令人惊讶的高准确性的两个波斯英语文本的句子对齐算法。这种线性的时间复杂性是通过我们新的独立于语言的锚点查找算法实现的,该算法使我们能够在一次尝试中以高准确度将整本书作为大平行文本对齐。据我们所知,该项目是英语-波斯语并行句子级语料库的首次自动构建。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号