首页> 外文会议>International joint conference on natural language processing;Conference on empirical methods in natural language processing >A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages
【24h】

A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages

机译:对真正的低资源语言的低资源依赖关系解析方法的系统比较

获取原文

摘要

Parsers are available for only a handful of the world's languages, since they require lots of training data. How far can we get with just a small amount of training data? We systematically compare a set of simple strategies for improving low-resource parsers: data augmentation, which has not been tested before; cross-lingual training; and transliteration. Experimenting on three typologically diverse low-resource languages—North Sami, Galician, and Kazah—We find that (1) when only the low-resource treebank is available, data augmentation is very helpful; (2) when a related high-resource treebank is available, cross-lingual training is helpful and complements data augmentation; and (3) when the high-resource treebank uses a different writing system, transliteration into a shared orthographic spaces is also very helpful.
机译:解析器仅适用于世界上少数几种语言,因为它们需要大量的训练数据。仅需少量的培训数据,我们能走多远?我们系统地比较了一套改进低资源解析器的简单策略:数据增强,之前尚未进行过测试;跨语言培训;和音译。在三种类型多样的低资源语言(北萨米语,加利西亚语和Kazah语言)上进行实验,我们发现(1)当只有低资源树库可用时,数据增强非常有帮助; (2)当有相关的高资源树库可用时,跨语言培训会有所帮助并补充数据的扩充; (3)当高资源树库使用不同的书写系统时,将音译成共享的正交空间也是非常有帮助的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号