首页> 外文期刊>ACM transactions on Asian language information processing >Transform, Combine, and Transfer: Delexicalized Transfer Parser for Low-resource Languages
【24h】

Transform, Combine, and Transfer: Delexicalized Transfer Parser for Low-resource Languages

机译:转换,合并和传输:用于低资源语言的非词化传输解析器

获取原文
获取原文并翻译 | 示例
           

摘要

Transfer parsing has been used for developing dependency parsers for languages with no treebank by using transfer from treebanks of other languages (source languages). In delexicalized transfer, parsed words are replaced by their part-of-speech tags. Transfer parsing may not work well if a language does not follow uniform syntactic structure with respect to its different constituent patterns. Earlier work has used information derived from linguistic databases to transform a source language treebank to reduce the syntactic differences between the source and the target languages.We propose a transformation method where a source language pattern is transformed stochastically to one of the multiple possible patterns followed in the target language. The transformed source language treebank can be used to train a delexicalized parser in the target language. We show that this method significantly improves the average performance of single-source delexicalized transfer parsers.We also show that, in the multi-source settings, parsers trained using a concatenation of transformed source language treebanks work better when a subset of the source language treebanks is used rather than concatenating all of them or only one.However, the problem of selecting the subset of treebanks whose combination gives the best-performing parser from the set of all the available treebanks is hard. We propose a greedy selection heuristic based on the labelled attachment scores of the corresponding single-source parsers trained using the treebanks after transformation.
机译:传输解析已用于通过使用其他语言(源语言)从树库的传输来开发不具有树库的语言的依赖性解析器。在非词法化传输中,已解析的单词将替换为其词性标签。如果一种语言就其不同的组成模式而言未遵循统一的语法结构,则传输解析可能无法很好地进行。早期的工作使用语言数据库中的信息来转换源语言树库,以减少源语言和目标语言之间的句法差异。我们提出了一种转换方法,其中将源语言模式随机转换为随后的多种可能模式之一目标语言。转换后的源语言树库可用于以目标语言训练去词义化的解析器。我们证明了该方法显着提高了单源去词化传输解析器的平均性能。我们还表明,在多源设置中,当源语言树库的一个子集被使用时,使用转换后的源语言树库的级联训练的解析器效果更好而不是将它们全部或仅连接在一起使用。但是,很难从所有可用树库的集合中选择其组合提供性能最佳的解析器的树库的子集。我们基于转换后使用树库训练的相应单源解析器的标记附件分数,提出了贪婪选择启发式算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号