首页> 外文会议>International conference on computational linguistics >Data-Driven Morphological Analysis and Disambiguation for Morphologically Rich Languages and Universal Dependencies
【24h】

Data-Driven Morphological Analysis and Disambiguation for Morphologically Rich Languages and Universal Dependencies

机译:数据驱动的形态分析和形态多样性丰富的语言和普遍依赖性的歧义消除

获取原文

摘要

Parsing texts into universal dependencies (UD) in realistic scenarios requires infrastructure for morphological analysis and disambiguation (MA&D) of typologically different languages as a first tier. MA&D is particularly challenging in morphologically rich languages (MRLs), where the ambiguous space-delimited tokens ought to be disambiguated with respect to their constituent morphemes. Here we present a novel, language-agnostic, framework for MA&D, based on a transition system with two variants, word-based and morpheme-based, and a dedicated transition to mitigate the biases of variable-length morpheme sequences. Our experiments on a Modern Hebrew case study outperform the state of the art, and we show that the morpheme-based MD consistently outperforms our word-based variant. We further illustrate the utility and multilingual coverage of our framework by morphologically analyzing and disambiguating the large set of languages in the UD treebanks.
机译:在实际情况下,将文本解析为通用依赖项(UD)需要第一层的基础,该基础结构用于将类型学上不同的语言进行形态分析和消歧(MA&D)。 MA&D在形态丰富的语言(MRL)中尤其具有挑战性,在这些语言中,模棱两可的以空格分隔的标记应在其构成语素方面加以消除。在这里,我们提出了一种新的,与语言无关的MA&D框架,该框架基于具有两个变体(基于单词和基于词素)的过渡系统,以及专用的过渡来减轻可变长度词素序列的偏差。我们在现代希伯来语案例研究中进行的实验超越了现有技术,并且表明基于词素的MD始终优于基于词的变体。我们通过形态学分析和消除UD树库中大量语言的歧义,进一步说明了我们框架的实用性和多语言覆盖。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号