...
首页> 外文期刊>ACM transactions on Asian and low-resource language information processing >An Approach to Construct a Named Entity Annotated English-Vietnamese Bilingual Corpus
【24h】

An Approach to Construct a Named Entity Annotated English-Vietnamese Bilingual Corpus

机译:一个方法来构造一个命名实体注释English-Vietnamese双语语料库

获取原文
获取原文并翻译 | 示例

摘要

Manually constructing an annotated Named Entity (NE) in a bilingual corpus is a time-consuming, labor--intensive, and expensive process, but this is necessary for natural language processing (NLP) tasks such as cross-lingual information retrieval, cross-lingual information extraction, machine translation, etc. In this article, we present an automatic approach to construct an annotated NE in English-Vietnamese bilingual corpus from a bilingual parallel corpus by proposing an aligned NE method. Basing this corpus on a bilingual corpus in which the initial NEs are extracted from its own language separately, the approach tries to correct unrecognized NEs or incorrectly recognized NEs before aligning the NEs by using a variety of bilingual constraints. The generated corpus not only improves the NE recognition results but also creates alignments between English NEs and Vietnamese NEs, which are necessary for training NE translation models. The experimental results show that the approach outperforms the baseline methods effectively. In the English-Vietnamese NE alignment task, the F-measure increases from 68.58% to 79.77%. Thanks to the improvement of the NE recognition quality, the proposed method also increases significantly: the F-measure goes from 84.85% to 88.66% for the English side and from 75.71% to 85.55% for the Vietnamese side. By providing the additional semantic information for the machine translation systems, the BLEU score increases from 33.04% to 45.11%.
机译:手动创建一个注释命名实体(NE)双语语料库是一个耗时的,劳动密集型,和昂贵的过程,但这个自然语言处理是必要的吗(NLP)任务,比如跨语言信息检索、跨语言信息提取机器翻译等。提供了一个自动的方法来构造一个注释不English-Vietnamese双语从双语平行语料库的语料提出一个对齐方法。语料库在最初的双语语料库NEs就是从自己的语言中提取出来的另外,试图纠正的方法未被认出的NEs或错误地认识新经济学院通过使用各种调整NEs之前双语的约束。只有提高了但也不能识别结果创建英语NEs和之间的对齐越南NEs,必要的培训不翻译模型。表明,该方法优于基准有效方法。对齐任务,F-measure增加68.58%到79.77%之间。不能识别质量,该方法也会增加显著:F-measure英语,从84.85%到88.66%从75.71%到85.55%的越南。提供额外的语义信息机器翻译系统,蓝色的分数从33.04%增加到45.11%。

著录项

相似文献

  • 外文文献
  • 中文文献
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号