首页> 外文会议>Natural language processing Pacific Rim symposium >GenusAlign: Word Alignment based on Genus Terms in a MRD
【24h】

GenusAlign: Word Alignment based on Genus Terms in a MRD

机译:genusalign:基于MRD中的词语术语词对齐

获取原文

摘要

Aligned parallel corpora have proved very useful in many natural language processing tasks, including statistical machine translation and word sense disambiguation. In this paper, we address issues related to current research in word alignment: coverage and resource requirements. In addressing these issues, we discuss the central problems of data sparseness and noise in the knowledge acquisition process and suggest an approach based on a bilingual machine-readable dictionary (MRD). We describe an MRD-based method called GenusAlign for word alignment, which relies on genus terms to cluster dictionary entries of headwords and translations. These Genus-based clusters are especially effective for alignment of suffixes pertaining to various semantic features, such as person, time, tool, etc. While not requiring a very large bilingual corpus, the GenusAlign algorithm nevertheless rivals corpus-based methods in coverage as well as precision.
机译:对齐的并行Corpora在许多自然语言处理任务中证明非常有用,包括统计机器翻译和词语感歧义。在本文中,我们解决了与“字对齐”中的当前研究相关的问题:覆盖范围和资源要求。在解决这些问题时,我们讨论了知识获取过程中数据稀疏和噪声的核心问题,并建议了一种基于双语机器可读字典(MRD)的方法。我们介绍了一种基于MRD的方法,称为Word Analement的Genusalign,它依赖于群集词条词条和翻译的群集词条。这些基因属的簇尤其有效地对准与各种语义特征有关的后缀,例如人,时间,工具等。同时不需要非常大的双语语料库,但遗传算法也在覆盖范围内竞争对手基语法的方法精确度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号