首页> 外文期刊>Journal of computer sciences >Tree Rotations for Dependency Trees: Converting the Head-Directionality of Noun Phrases
【24h】

Tree Rotations for Dependency Trees: Converting the Head-Directionality of Noun Phrases

机译:依赖树树的树旋转:转换名词短语的头向方向性

获取原文
       

摘要

To overcome the lack of NLP resources for the low-resource languages, we can utilize tools that are already available for other highresource languages and then modify the output to conform to the target language. In this study, we proposed an approach to convert an Indonesian constituency treebank to a dependency treebank by utilizing an English NLP tool (Stanford CoreNLP) to create the initial dependency treebank. Some annotations in this initial treebank did not conform to Indonesian grammar, especially noun phrases’ head-directionality. Noun phrases in English usually have head-final direction, while in Indonesian is the opposite, head-initial. We proposed a variant of tree rotations algorithm named headSwap for dependency trees. We used this algorithm to convert the head-directionality for noun phrases that were initially labeled as a compound. Moreover, we also proposed a set of rules to rename the dependency relation labels to conform to the recent guidelines. To evaluate our proposed method, we created a gold standard of 2,846 tokens that were annotated manually. Experiment results showed that our proposed method improved the Unlabeled Attachment Score (UAS) with a margin of 32.5% from 61.6 to 94.1% and the Labeled Attachment Score (LAS) with a margin of 41% from 44.1 to 85.1%. Finally, we created a new Indonesian dependency treebank that converted automatically using our proposed method that consists of 25,416 tokens. The dependency parser model built using this treebank has UAS of 75.90% and LAS of 70.38%.
机译:为了克服低资源语言缺乏NLP资源,我们可以利用已有其他高源语言可用的工具,然后修改输出以符合目标语言。在这项研究中,我们提出了一种方法来通过利用英文NLP工具(Stanford CoreNLP)来创建初始依赖性树库来将印度尼西亚选区TreeBank转换为依赖TreeBank。这个初始树木银行中的一些注释并不符合印度尼西亚语法,尤其是名词短语的头向方向性。英语中的名词短语通常具有最终方向,而在印度尼西亚是相反的,头部首字母。我们提出了一个名为Repency树耳机的树旋转算法的变体。我们使用该算法将最初标记为化合物的名词短语的头向方向性。此外,我们还提出了一组规则来重命名依赖关系标签,以符合最近的指导方针。为了评估我们所提出的方法,我们创造了手动注释的2,846令牌的黄金标准。实验结果表明,我们的提出方法改善了未标记的附着得分(UAS),边距为32.5%,从61.6%到94.1%,标记的附件得分(LAS),边距从44.1%到85.1%。最后,我们创建了一个新的印度尼西亚依赖树库,它使用我们的建议方法自动转换,该方法由25,416个令牌组成。使用此TreeBank构建的依赖性解析器模型具有75.90%的UA,LA为70.38%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号