【24h】

Context Sensitive Neural Lemmatization with Lematus

机译:情境敏感的Lematus神经起病

获取原文

摘要

The main motivation for developing context-sensitive lemmatizers is to improve performance on unseen and ambiguous words. Yet previous systems have not carefully evaluated whether the use of context actually helps in these cases. We introduce Lematus, a lemma-tizer based on a standard encoder-decoder architecture, which incorporates character-level sentence context. We evaluate its lemmatization accuracy across 20 languages in both a full data setting and a lower-resource setting with 10k training examples in each language. In both settings, we show that including context significantly improves results against a context-free version of the model. Context helps more for ambiguous words than for unseen words, though the latter has a greater effect on overall performance differences between languages. We also compare to three previous context-sensitive lemmatization systems, which all use pre-extracted edit trees as well as hand-selected features and/or additional sources of information such as tagged training data. Without using any of these, our context-sensitive model outperforms the best competitor system (Lemming) in the full-data setting, and performs on par in the lower-resource setting.
机译:开发上下文敏感词形修饰符的主要动机是提高对看不见且含糊的单词的性能。然而,先前的系统尚未仔细评估上下文的使用在这些情况下是否真正有所帮助。我们介绍Lematus,这是一种基于标准编码器-解码器体系结构的引理器,它结合了字符级句子上下文。我们以完整的数据设置和资源较少的设置(每种语言有10k训练示例)来评估其在20种语言中的词素化准确性。在这两种设置中,我们都表明,相对于无上下文版本的模型,包含上下文可显着改善结果。与不可见的单词相比,上下文对不明确的单词的帮助更大,尽管后者对语言之间的总体性能差异的影响更大。我们还比较了三个以前的上下文相关的词形还原系统,它们都使用了预提取的编辑树以及手动选择的特征和/或其他信息源,例如标记的训练数据。在不使用任何这些方法的情况下,我们的上下文相关模型在完整数据设置中的表现优于最佳竞争者系统(Lemming),而在资源较少的设置中却​​表现出色。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号