首页> 外文会议>International conference on language resources and evaluation >Annotated Corpora for Word Alignment Between Japanese and English and its Evaluation with MAP-based Word Aligner

Annotated Corpora for Word Alignment Between Japanese and English and its Evaluation with MAP-based Word Aligner




This paper presents two annotated corpora for word alignment between Japanese and English. We annotated on top of the IWSLT-2006 and the NTCIR-8 corpora. The IWSLT-2006 corpus is in the domain of travel conversation while the NTCIR-8 corpus is in the domain of patent. We annotated the first 500 sentence pairs from the IWSLT-2006 corpus and the first 100 sentence pairs from the NTCIR-8 corpus. After mentioned the annotation guideline, we present two evaluation algorithms how to use such hand-annotated corpora: although one is a well-known algorithm for word alignment researchers, one is novel which intends to evaluate a MAP-based word aligner of Okita et al. (2010b).
机译:本文展示了两种注释的日语和英语词语对齐的语料库。我们注释了IWSLT-2006和NTCIR-8 Corpora的顶部。 IWSLT-2006语料库是在旅行对话的领域,而NTCIR-8语料库是专利领域。我们注释了来自IWSLT-2006语料库的前500个句子对,以及来自NTCIR-8语料库的前100个句子对。在提到注释指南之后,我们呈现了两个评估算法如何使用这种手中的语料库:虽然一个是一个知名的单词对准研究人员算法,但是一个是新颖的,它打算评估okita等人的地图词对齐器。 。 (2010B)。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号