【24h】

Inducing Morphemes Using Light Knowledge

机译:利用轻知识诱导词素

获取原文
获取原文并翻译 | 示例
       

摘要

Allomorphic variation, or form variation among morphs with the same meaning, is a stumbling block to morphological induction (MI). To address this problem, we present a hybrid approach that uses a small amount of linguistic knowledge in the form of orthographic rewrite rules to help refine an existing Mi-produced segmentation. Using rules, we derive underlying analyses of morphs-generalized with respect to contextual spelling differences-from an existing surface morph segmentation, and from these we learn a morpheme-level segmentation. To learn morphemes, we have extended the Morfessor segmentation algorithm [Creutz and Lagus 2004; 2005; 2006] by using rules to infer possible underlying analyses from surface segmentations. A segmentation produced by Morfessor Categories-MAP Software v. 0.9.2 is used as input to our procedure and as a baseline that we evaluate against. To suggest analyses for our procedure, a set of language-specific orthographic rules is needed. Our procedure has yielded promising improvements for English and Turkish over the baseline approach when tested on the Morpho Challenge 2005 and 2007 style evaluations. On the Morpho Challenge 2007 test evaluation, we report gains over the current best unsupervised contestant for Turkish, where our technique shows a 2.5% absolute F-score improvement.
机译:同种异体变异或具有相同含义的形态之间的形式变异,是形态诱导(MI)的绊脚石。为了解决这个问题,我们提出了一种混合方法,该方法以正字法重写规则的形式使用少量的语言知识来帮助改进现有的Mi产生的分割。使用规则,我们从现有的表面形态分割中得出了针对上下文拼写差异的一般形态的基础分析,并从中学习了语素级分割。为了学习语素,我们扩展了Morfessor分割算法[Creutz and Lagus 2004; 2005; [2006]通过使用规则从曲面分割中推断出可能的基础分析。由Morfessor Categories-MAP Software v。0.9.2产生的细分用作我们过程的输入和评估的基准。为了建议对我们的程序进行分析,需要一套特定于语言的拼字规则。在2005年Morpho Challenge和2007年风格评估中进行测试时,与基线方法相比,我们的程序对英语和土耳其语产生了可喜的改进。在2007年Morpho挑战赛的测试评估中,我们报告了目前土耳其最好的无人监督竞赛选手的收益,该技术的绝对F评分提高了2.5%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号