首页> 外文会议>ANLP 2011 >Building a Multilingual Named Entity-Annotated Corpus Using Annotation Projection
【24h】

Building a Multilingual Named Entity-Annotated Corpus Using Annotation Projection

机译:使用注释投影构建多语言命名实体注释的语料库

获取原文

摘要

As developers of a highly multilingual named entity recognition (NER) system, we face an evaluation resource bottleneck problem: we need evaluation data in many languages, the annotation should not be too time-consuming, and the evaluation results across languages should be comparable. We solve the problem by automatically annotating the English version of a multi-parallel corpus and by projecting the annotations into all the other language versions. For the translation of English entities, we use a phrase-based statistical machine translation system as well as a lookup of known names from a multilingual name database. For the projection, we incrementally apply different methods: perfect string matching, perfect consonant signature matching and edit distance similarity. The resulting annotated parallel corpus will be made available for reuse.
机译:作为高度多语言命名实体识别(NER)系统的开发人员,我们面临评估资源瓶颈问题:我们需要多种语言评估数据,注释不应过于耗时,而跨语言的评估结果应该是可比的。我们通过自动注释一个多个并行语料库的英语版本以及将注释投影到所有其他语言版本中来解决问题。对于英语实体的翻译,我们使用基于短语的统计机器翻译系统以及从多语言名称数据库中查找已知名称的查找。对于投影,我们逐步应用不同的方法:完美的字符串匹配,完美的辅音签名匹配和编辑距离相似性。由此产生的注释并行语料库可用于重用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号