首页> 外文期刊>ACM transactions on Asian and low-resource language information processing >Multi-level Chunk-based Constituent-to-Dependency Treebank Transformation for Tibetan Dependency Parsing
【24h】

Multi-level Chunk-based Constituent-to-Dependency Treebank Transformation for Tibetan Dependency Parsing

机译:基于多级基于块的组成依赖性TreeBank转换,用于藏依赖性解析

获取原文
获取原文并翻译 | 示例

摘要

Dependency parsing is an important task for Natural Language Processing (NLP). However, a mature parser requires a large treebank for training, which is still extremely costly to create. Tibetan is a kind of extremely low-resource language for NLP, there is no available Tibetan dependency treebank, which is currently obtained by manual annotation. Furthermore, there are few related kinds of research on the construction of treebank. We propose a novel method of multi-level chunk-based syntactic parsing to complete constituentto-dependency treebank conversion for Tibetan under scarce conditions. Our method mines more dependencies of Tibetan sentences, builds a high-quality Tibetan dependency tree corpus, and makes fuller use of the inherent laws of the language itself. We train the dependency parsing models on the dependency treebank obtained by the preliminary transformation. The model achieves 86.5% accuracy, 96% LAS, and 97.85% UAS, which exceeds the optimal results of existing conversion methods. The experimental results show that our method has the potential to use a low-resource setting, which means we not only solve the problem of scarce Tibetan dependency treebank but also avoid needless manual annotation. The method embodies the regularity of strong knowledge-guided linguistic analysis methods, which is of great significance to promote the research of Tibetan information processing.
机译:依赖性解析是自然语言处理(NLP)的重要任务。然而,一个成熟的解析器需要大型的TreeBank进行培训,这仍然是创造的昂贵。藏族是一种极低的NLP的低资源语言,没有可用的西藏依赖树库,目前通过手动注释获得。此外,有几种关于树木库建设的相关研究。我们提出了一种新的基于基于块的句法解析的新方法,以完成稀缺条件下藏族的组成依赖性树木银行转换。我们的方法挖掘了藏语句子的更多依赖性,构建了一种高质量的藏依赖树语料库,并使语言本身的固有定律更充分地使用。我们在通过初步转换获得的依赖性树库上培训依赖解析模型。该模型可实现86.5%的精度,96%LAS和97.85%UA,超过现有转换方法的最佳结果。实验结果表明,我们的方法有可能使用低资源设置,这意味着我们不仅解决了稀缺的藏依赖树库,而且避免了不必要的手动注释。该方法体现了强大知识引导的语言分析方法的规律性,这对促进藏语信息处理的研究具有重要意义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号